All this week, I have been working on two large releases to our database topology. We have a complicated system, at least when it comes to releasing schema changes. We use a third party replication system that doesnt replicate DDL changes. So, we have to go thru some hoops to push changes out to the systems. And there are multiple systems, some being HOT, others WARM, and so on. We also have multiple environments, Prod, Stage, Integration, and so on.
When our Development group pushes changes out, they are in an Agile mode of developement, and often do not know the actual dates they want to push things out. Often, we'll have slippage of dates, and projects leapfrogging one another. Being on the Service Delivery side of the fence, across the field from the Development group, and apparantly speaking an entirely different dialect of the IT language we all speak, we end up having issues. This week was one of those. A lot of time was spent digging into the release requirements, seeing how this could best be applied to our systems, creating release plans to perform the individual steps, and actually testing these steps in a test environment that mimics our multiple server Production environment. It takes a bit of time to prep the environment, and ensure that the release is ready to be performed, then to actually perform it. Once done, reversing what we just did, and doing it again often occurs.
Needless to say, we are always trying to make the process better, cleaner, smoother, and more effective. The whole idea now is to spend this time now, before a release, and test it over and over, to ensure it succeeds on the actual release time. So far, so good. We are making good progress. The last release, was one of the larger ones we've ever done, and we had no failures or rollbacks.
The biggest problem with these releases and processes? Not the Tech. Not the database. Not the data. Its people dealing with other people. I alluded to it earlier with the IT language quip. But this seems to be the crux of any issues. And I do not stand on the fence saying that those people over there are he fault. Its human interactions and communications that is the problem.
Picking a date and time for a release seems to be an almost impossible task. We currently can pick any day but Friday. Prod releases occur after 6pm, while Stage and all other lower environments can happen any day, any time. And they often do, regardless of the time that was set and agreed upon. Things happen, times slip, often dates slip. Every other scheduled time and date subsequent to the first slippage naturally occurs.
If you have read this far, and was secretly hoping that you'd stumble upon the solution to the above issues, i fear i have to dissapoint you. I do not have them. Each time I think I do, I find that I end up introducing issues that others complain about. Often this is an unforseen side effect to my statements. What I do hope that happens is that if you have read this far, you have applied this story to your systems, and have seen discrepancies between mine and yours and have ways to improve ours. Or, maybe you actually had some ideas of how to alter a release management process and want to share.
So, lets talk, discuss and share.
1 comment:
TJay,
As a developer, I can tell you the pain isn't any easier on the other side of the fence. It does sound like your development team needs to stabilize their release processes a bit, however.
In our shop, we've adopted a modified form of Scrum, an agile development mode. You can obviously google it and find much greater detail than I can post here, but I thought I'd share our modifications. Again, note that our process isn't perfect, but it does seem to have improved our roll-out procedures tremendously.
1. We schedule development time into segments called Sprints; Sprints typically last 4-5 weeks, and we do our best to not deploy anything except on the last day of the Sprint (Sprint Release day); in our shop, that's a Wednesday.
2. Product Users (typically the managers from various departments) are allowed to add a unit of work (a Product Backlog Item; PBI) to our backlog (our list of stuff to do) at any time.
3. At the beginning of the Sprint, the dev department meets with the Product Users to scope out the PBI's we need to address in the Sprint. Each PBI consists of a series of SBI's (or tasks needed to implement the whole unit of work). As a rule, we don't release a PBI until all of its children (SBI's) are accomplished. The developers give an estimate of work effort; the Product Users give an estimate of priority.
4. We do a code freeze for QA the week before a Sprint ends (the Sprint Release Day); we review what we plan to release, and get sign off. We don't always meet the goals of the sprint, and the Product Users may decide they want to delay the release of some PBI at this point. We continue to give updates throughout that week in case something doesn't pass QA or a priority for release changes.
5. If an "emergency" occurs during a sprint, we may have a patch day (Wednesday). Patches are intended to be lightweight, and easy to install, and absolutely necessary for the business to continue. We strive to NOT have Patch days, but if we do, it's always a Wednesday. If it's an ABSOLUTE emergency that requires immediate attention, we may violate that rule but not without alerting ALL product users. In my experience, most things can wait a month, and nearly everything can wait a week; it should be rare that we deploy anything on any other day besides a Wednesday.
6. During the sprint, the developers meet daily to discuss(and the Product Users are invited to listen) 3 things: what we worked on last, what we're working on today, and any impediments to the process. This keeps us on track.
We're still refining the process, but I can tell you that having a consistent release date that ususally occurs once a month has greatly reduced the pain of deployment. The only challenge has been to our QA processes because they used to get a small set of code often, and now they get a great deal of code all at once.
Post a Comment