Saturday, February 11, 2012

Fighting my way to agility - Part 2

Trying to Fix the Timebox... and Failing

We tried to fix the time and vary the scope and be shippable at the end of the sprint.   Well, that didn't really work.  How do you run 1200 manual tests in the sprint? What if I just have a month of code integrated that other people built stuff on top of, and I'm almost done but not quite? What if the app doesn't work at the end?  What if the smallest possible work size is just big?  If we have to be shippable at the end of the sprint, and the scope of getting to quality is highly variable, how could we possibly fix the timebox?  

I've seen people try to do testing sprints or hardening sprints or integration sprints but it really fundamentally destroys one of the core principles of Scrum - being shippable at the end of every sprint.  Shippable as in - you really could put what you just coded in production.  As soon as you make it ok to throw your testing into another sprint, you ignore that fundamental constraint.  And even worse, it masks a HUGE leering problem that really should be your priority to solve.  

If you do another sprint, with your testing all delayed, you then violate another core principle - you shouldn't build on top of BROKEN code.  It was massively harder to diagnose bugs when there was more code that hadn't been proven yet.  It was roughly about 4 to 1.  Doubling the amount of code, roughly quadrupled the time of our test/fix cycle.   But it didn't seem to take much more to go off the test/fix cliff of no return.

Both of these are way more important to not violate than time boxes.  Yet I see soooo many teams keep the time boxes and throw out the other 2.   Until you can actually be predictable, I don't think the timeboxes buy you much anyway.  Something had to give, we voted for the box... with a goal of working on becoming more 'boxable'.

So we scheduled our 'sprint' for what we thought was about a month-ish of work, but then however long it took to test the app and get it all working and finish anything that we couldn't pull out, we just took that time and didn't work on anything new.  It was done whenever we were back to shippable.  Nobody was allowed to work on forward development.  Nobody did any refactoring that might add more risk.  Get the app back to being green.   

It was easy to fall into a trap of worrying about inefficiencies, especially as test/fix cycles dragged out, the business wanted their features.   We tried creating a release branch and having a few continue on forward development while we focused on stabilizing.  This idea blew up in our face.  We were only really thinking about the penalty of managing the branch and merging, and didn't foresee this at all.  

Eventually we got the release out the door and planned out another one month-ish sprint.  Then we got to our test/fix phase, but there was an even bigger batch of work in it!  The last stabilization had taken ~2 months, and so we actually did about twice as much work.  The difference in time that it took to troubleshoot defects in the larger set of changes was HUGE.  The troubleshooting complexity shot through the roof.   It was another 6 months before we were able to get the changes out the door.  

Forget getting a head start, and have your spare capacity work on the problem right in front of you - its too damn hard to get code out the door.  What good are a bunch of coded features that you can't ship?

No comments:

Post a Comment