Thursday, May 26, 2011

ToC in Software

I always seem to find myself writing in the middle of the night.  I get an idea in my head and can't sleep.  I don't want to forget, and so I can't let it go and just rest... so here I am again, letting the blog remember so I don't have to.

I just got done listening to The Goal on audio book during my morning commute.  Besides being quite entertaining... I got a lot of good stuff out of it.  I've been thinking about, over the last week or so, how to use the new insights to understand the process of producing software better.  And while I was laying in bed, I discovered something new.

I haven't thought all this through yet, but the patterns make sense to me.

So in The Goal, they have a furnace as one of their bottlenecks.  There's a pile of parts sitting in front of it that need processing, some of which aren't going to be used in current customer orders and are to build stock.  Some of the parts in the pile have quality issues.  A couple things they do to increase throughput... move quality inspection to before the bottleneck, and prioritize the bottleneck work that makes money is prioritized over work that doesn't.

My first thought was that this sounds a lot like backlog management and planning.  Prioritize and prepare the work ahead of time so that when it's time to start executing on work, we don't clog up the capacity with unimportant work, and we maximize the time spent in development.  So I started thinking, does this process just assume that development is the bottleneck?  If it isn't, and we are upstream of the real bottleneck, won't we just cause chaos downstream?

Software has some major differences that drastically effect the complexity of the system.  In terms of constraints... all of the historical outputs of the system are also inputs to the system and greatly effect capacity.  You can consume capacity in order to increase capacity.   The work itself flowing through the system is highly variable, uncertain and linked with several dependent events.  The work flowing through the system is interdependent with other work flowing through the system.  Work is highly negotiable - you can cut corners, optimize for write over read (throughput over future capacity), reduce scope of the work, and assume more risk to save time.  You can bend the work to your will to an extent...

It's quite feasible that your bottlenecks could change, given all of these variables... but I wonder how much they really do...? We often bend the work in the system to make it fit.  We often keep bending it the same ways too.  In the book, they talk about a couple cases where it seems like the bottlenecks are drifting, but in fact they really aren't.  The problem was actually the flow of material.

Made me think of a common problem with QA.  QA is starved for work waiting until developers complete something so that they can test.  Then all the work gets done about the same time, and QA suddenly doesn't have enough capacity and we miss our deadline.  Did the bottleneck just change from development to capacity? It sure feels like it... but maybe this is just the same kind of flow problem?  A bottleneck is defined as any resource who's total capacity is equal to or less than demand.  In the overall system, there is usually a constant queue in front of development and the work demanded typically exceeds available capacity. 

In an iteration, if we consider just the smaller subsystem, development capacity is typically filled and then it flows downhill from there.  Development is no longer the bottleneck in the subsystem, they aren't (intentionally anyway) given more work than their capacity allows. Development only becomes the bottleneck if they unintentionally exceed their own capacity, which is quite easy to do as well.

But assuming development happens smoothly, we still slam a bunch of work through at the same time, and QA is suddenly buried.  If QA's total capacity is greater than the total amount of work that needs to be done, it is by definition not a bottleneck... if thats the case, and we could fix the flow problem, the system could actually operate smoothly.   When I started to think about why all the work gets done at once... the obvious answer occurred to me... because we typically start it all at once.  It's built into the iterative process.   I wonder what would happen if you made no other changes than just staggered the work starts?

There are definitely reasons to have synchronization points, but as we get closer to continuous, we could only synchronize when we actually need to synchronize and otherwise just maintain a sustainable pace?

When any part of the system is slammed we tend to bend the work... and we generally always pay dearly for it too.  There's nothing like a constant sense of urgency to trigger massive capacity sacrifice.

No comments:

Post a Comment