Friday, February 17, 2012

Fighting my way to agility - Part 5

Shrinking Release Size

Now that our batches were shrinkable, we could feasibly shrink our iteration and release sizes.  The ultimate test of whether you are -really- shippable is to actually ship.  If we could actually get our software in production, we could put the risk behind us for the changes we'd done so far.  If you don't actually ship, it was hard to -really- know if we were shippable.  We were also introducing more risk at a time to production - and troubleshooting production defects would usually take longer to diagnose and fix.

But our customers were just starting to trust us, and were still doing about a month of testing after our testing before they would feel safe enough to install something.   We asked if we could do releases more often, and the answer was pretty much... 'hell no.'  Rather than give up on trying, we figured out why there was so much pushback.  And solving those problems, whatever they might be, became our top priority.

We learned about all kinds of problems that we didn't even know we had.  Some were ticket requests that had been sitting in the backlog for years.  And others were just stuff that had to be done for every release that isn't really a big deal until we asked them to be done a whole lot more often.   There was a whole lot of pain downstream that we weren't even aware of.  And most of it was really just our problems rolling downhill - and completely within our power to fix.

Reducing the Transaction Cost of a Release

There's lots of different kinds of barriers to releasing more often.  Regardless of what yours are, getting the right people in the room, working together to understand the whole system goes a long ways. A lot of the stuff that seem like hard timeline constraints actually aren't.  Challenge your assumptions. 

So how could we relieve our customer's pains?

"Everytime we do a release, we lose some data" - we had no idea.  The system was designed to do a rolling restart, but there was a major problem.  During the roll over we had old component processes communicating with new ones.  In general the interfaces were stable, but there was still subtle coupling in the data semantics that caused errors.  Rather than trying to test for these conditions, we instead changed the failover logic so all of the data processing would be sticky to either the old version or new version and could never cross over.   This prevented us from having to even think about solving cross-talk scenarios.  We also created a new kind of test that mimicked a production upgrade while the system was highly live.  This turned out to be a great test for backward compatibility bugs as well.

"Its not safe to install to production without adequate testing, and we can't afford to do this testing more often" - Whether they found bugs or not while testing, was almost irrelevant.  Unless they knew what was tested and felt safe about it, they wouldn't budge.  They were doing different testing, with different frameworks, tools and people than we were, and unless it was done that way, it was a no go.  So we went to our customer site.  We learned about their fears and what was important to them.  We learned how they were testing and wanted testing to be done.   

We shared our scenario framework with them, code and all, and then worked with them to automate their manual tests in our framework.  We made sure their tests passed (and could prove it), before we gave them the release.  And likewise, we adopted some of their testing tools and techniques and at release time gave them a list of what we had covered.  We also started giving them a heads up about what areas of the system were more at risk based on what we had changed so they didn't feel so much need to test everything.  After we helped them to reduce their effort and just built a lot more trust and collaborative spirit with our customers, this was no longer an issue.

Editing the Scrum Rule Book - What process tools DID we actually use?

Since we weren't predictable and weren't using time boxes, we also threw out story point estimation, velocity and any estimation-based planning activities.  The theory goes that you improve your ability to estimate and therefore your predictability by practicing estimation.  I think this is largely a myth.

"Predictions don't create Predictability." This is one of my favorite quotes from the Poppendieck's book, Implementing Lean Software Development.   You create predictability by BEING predictable.  The more complex and far in the future your predictions, the more likely you are to be wrong - and way wrong.  So wrong that you are likely to make really bad decisions under the illusion that your predictions are accurate.   Its an illusion of control when control doesn't actually exist.  You can't be in control until you ARE controllable.   Predictability doesn't come from any process, its an attribute that exists (or doesn't) in the system.  Uncertainty is very uncomfortable.  But unless you face reality and focus on solving the root problem, nothing is ever really likely to change.

Burn downs we did use, but not until we were closer to wrapping up a release.  This was helpful in answering the 'are we done yet?' questions, the timing of which we used to synchronize other release activities.  There were tickets to submit, customer training to do, customer testing to coordinate etc.  We tried doing burn downs for the whole sprint, but since our attempted estimates were so wildly inaccurate - it wasn't helpful and more so harmful as input into any decisions.  The better decision input was that we really had no idea, but were trying to do as little as possible so done would come as soon as possible.  If a decision had to be made, we would try and provide as much insight as we could to improve the quality of the decision, without hiding the truth of uncertainty.  Although management never liked the answers, our customers were unbelievably supportive and thankful for the truth.

No comments:

Post a Comment