Saturday, February 11, 2012

Fighting my way to agility - Part 1

Real Projects Have Hard Problems

The challenge with principles is that its really hard to see how they apply in a specific context.  The idea of the principle may make perfect sense. How the software process should work in theory may make perfect sense.  But the messy world of software reality isn't so kind.  

Concrete messy real world experiences are awesome to learn from because they give us insights on how to take abstract truths (principles) and map them to reality.   From all of these examples, we can distill patterns that give us ideas on how to map solutions between different problems.   Its far too easy to make a really bad decision if you over-simplify a very complex world.  Real software is just hard.  

For anyone who reads this, please share your stories.  Whether it ends in victory or not, its the guts of our learning.  By sharing them we broaden all of our experiences, and improve all of our abilities to tackle the problems we face. 

So here's one of my stories.  This is the story of a messy real world project in which I personally fought my way to agility.

My Project

A semiconductor factory SPC system responsible for reading and aggregating data coming off of the tools to detect problems (and shut down what was causing the problem).   High volume, highly variable incoming data stream, user-defined analysis programs and near real-time charts.   A program would usually gather a mix of historical data and current data, and do a bunch of math on the results to make a decision.  If we took too long to make that decision, the tool would timeout and shutdown.  Deployed in a 24/7 environment with 1 downtime per year.

Starting point (for me, 2005)

Scary reality. :) 500k lines of code, ~10 years old. The web server was home-grown, the UI flow-control was done with exceptions (seriously!), and the core SPC engine had bandaid over bandaid of hack fixes that it was next to impossible to make a change without having some unintended side-effect on another use case. 1200 formal manual test cases to all be run and made green for every release.  

Half of the team in Austin, half in India, one in Germany, and customers in both Austin and Germany each with different data formats, problems and strategies.  Interestingly, the team started with 2 week iterations, but as the transaction cost of delivery went up with the growing test burden, the batches kept getting larger to compensate.   When I got there, they were doing ~4 months of development and a couple months of test/fix chaos after that. 

They had recently been having major performance problems and didn't know how to solve them.  So the 2 months of chaos 2 months and counting with no end in sight.   I moved from an XP shop building financial transaction/banking software, but they had really hired me for my Oracle performance wizardry skills.  When I got there it was even more tragic, they had also been having major quality problems.  They had just rolled back the production release - again.  Our customers were literally scared to install our software.  We were sitting on the tipping point of being completely unable to release with critical defects that we couldn't diagnose.

End point (3 years later)

Consistently delivering releases every ~2 months with high-quality, stable performance, and predictable behavior.   Deployed 2 more installations, one of them was a totally different type of processing facility.  Happy customers.  They even threw a party for me when I went to Germany!

Going from A to B - Was it Magic?

It was pain.  A lot of pain.  A lot of mistakes.  A lot of learning and hard work.

We accidentally shutdown every tool in the fab (twice).  We accidentally had a sprint that our test and fix churn dragged out for a year before we saw light at the end of the tunnel (way scary).  We threw away massive investments in test automation with QTP and started over.   

We were aiming to do Scrum.  We all read a Scrum book, and Bryan (still my awesome boss) was then our manager and Scrum Master. 

It took a long and hard journey to accomplish real change. 

No comments:

Post a Comment