Thursday, May 26, 2011

ToC in Software

I always seem to find myself writing in the middle of the night.  I get an idea in my head and can't sleep.  I don't want to forget, and so I can't let it go and just rest... so here I am again, letting the blog remember so I don't have to.

I just got done listening to The Goal on audio book during my morning commute.  Besides being quite entertaining... I got a lot of good stuff out of it.  I've been thinking about, over the last week or so, how to use the new insights to understand the process of producing software better.  And while I was laying in bed, I discovered something new.

I haven't thought all this through yet, but the patterns make sense to me.

So in The Goal, they have a furnace as one of their bottlenecks.  There's a pile of parts sitting in front of it that need processing, some of which aren't going to be used in current customer orders and are to build stock.  Some of the parts in the pile have quality issues.  A couple things they do to increase throughput... move quality inspection to before the bottleneck, and prioritize the bottleneck work that makes money is prioritized over work that doesn't.

My first thought was that this sounds a lot like backlog management and planning.  Prioritize and prepare the work ahead of time so that when it's time to start executing on work, we don't clog up the capacity with unimportant work, and we maximize the time spent in development.  So I started thinking, does this process just assume that development is the bottleneck?  If it isn't, and we are upstream of the real bottleneck, won't we just cause chaos downstream?

Software has some major differences that drastically effect the complexity of the system.  In terms of constraints... all of the historical outputs of the system are also inputs to the system and greatly effect capacity.  You can consume capacity in order to increase capacity.   The work itself flowing through the system is highly variable, uncertain and linked with several dependent events.  The work flowing through the system is interdependent with other work flowing through the system.  Work is highly negotiable - you can cut corners, optimize for write over read (throughput over future capacity), reduce scope of the work, and assume more risk to save time.  You can bend the work to your will to an extent...

It's quite feasible that your bottlenecks could change, given all of these variables... but I wonder how much they really do...? We often bend the work in the system to make it fit.  We often keep bending it the same ways too.  In the book, they talk about a couple cases where it seems like the bottlenecks are drifting, but in fact they really aren't.  The problem was actually the flow of material.

Made me think of a common problem with QA.  QA is starved for work waiting until developers complete something so that they can test.  Then all the work gets done about the same time, and QA suddenly doesn't have enough capacity and we miss our deadline.  Did the bottleneck just change from development to capacity? It sure feels like it... but maybe this is just the same kind of flow problem?  A bottleneck is defined as any resource who's total capacity is equal to or less than demand.  In the overall system, there is usually a constant queue in front of development and the work demanded typically exceeds available capacity. 

In an iteration, if we consider just the smaller subsystem, development capacity is typically filled and then it flows downhill from there.  Development is no longer the bottleneck in the subsystem, they aren't (intentionally anyway) given more work than their capacity allows. Development only becomes the bottleneck if they unintentionally exceed their own capacity, which is quite easy to do as well.

But assuming development happens smoothly, we still slam a bunch of work through at the same time, and QA is suddenly buried.  If QA's total capacity is greater than the total amount of work that needs to be done, it is by definition not a bottleneck... if thats the case, and we could fix the flow problem, the system could actually operate smoothly.   When I started to think about why all the work gets done at once... the obvious answer occurred to me... because we typically start it all at once.  It's built into the iterative process.   I wonder what would happen if you made no other changes than just staggered the work starts?

There are definitely reasons to have synchronization points, but as we get closer to continuous, we could only synchronize when we actually need to synchronize and otherwise just maintain a sustainable pace?

When any part of the system is slammed we tend to bend the work... and we generally always pay dearly for it too.  There's nothing like a constant sense of urgency to trigger massive capacity sacrifice.

Wednesday, May 4, 2011

What's in a variable name?


Good variable naming is more than giving things good names... it requires decomposing your problem into thought-sized chunks so that you can give things good names.  If you cant think of a good name something, it should again lead you back to evaluating your design.  Sometimes though, you really do need to invent a new term to communicate a concept in your system.
 
You should be able to read code that calls out to several different methods and be able to have a decent idea whats going on. If you look at the implementation and are surprised at what it does, thats a problem. Surprises lead to misunderstandings -> mistakes -> and defects.  Theres general rules of thumb, e.g. only doing one thing, or not unexpectedly mutating input arguments.  But other kinds of surprises are much harder to just list.  We attach complex meanings to words as metaphors for concepts in every day life, ideas in a specific domain, or common patterns and conventions that we use in code.  Words don't always mean the same thing to different people, or can have multiple meanings in different contexts.  On most projects, we end up making a glossary at some point of agreed on terms and meaning so that we can communicate better in speech, but also in code.  It's a massive help to have a common set of building block ideas with the people you have to 'group-think' with in code.  
 
Learning design patterns will help you pick up some general vocab, but other conventions are established more from tools.  Like a method getXXX() should generally return a property, if it does something other than that, it will generally surprise people.  Even if the method does essentially 'get' something, its still surprising because its an established convention, so sometimes you need to use a different word that means the same thing to convey a different meaning. :)  Sometimes a popular tool will pick unfortunate names for things and they'll be adopted into general vocabulary and cause confusion.
 
Most of the code out there wasn't written to be understood.  And yet we spend way more time trying to understand existing code than we ever spend writing it.  But it's not easy... even when people try to write understandable code, they often still miserably fail.  Per Scott, "thinking is hard."  :)  Breaking down a problem into thought-sized chunks so that you can think about one chunk and know you don't need to worry about the other chunks... is hard.  That's the kind of stuff you spend your career working to master, and when I say a developer is 'really good' that's usually what I'm referring to.  It's the ability that crosses languages, tools, or frameworks that's responsible for massive increases in productivity.  

What's your unit testing philosophy?

A member of the Austin Software Mentorship group asked this question on the mailing list.  The kinds of questions being asked, is remarkable to me, and from college students! Just wow.  I had recently been a part of a great discussion at Lean Software Austin, and had been thinking about this stuff a lot.  So I took some time to reply, and helped me to capture my thoughts too...


When you are writing tests, one concern is to know if the code you wrote or are writing works or not.  But by automating the tests as a permanent addition to the code base, you have another set of responsibilities.
 
What's going to happen when one of your tests break and someone else on your team (or someone who will be on your team later or even you in a year or 2) has to understand what your tests were intending, whether that concern is still valid or needs to be modified or moved elsewhere?  If someone were to modify your code, what might they be likely to misunderstand and make a mistake? What conditions are not obvious from the design that might trip someone up? Can you change the code/design to make it obvious and clear? If you are delaying or unable to make the design more easy to understand, can you document the non-obviousness in a test that will fail in a way that will inform the developer of the mistake they made? 
 
More often than not, incomprehensible automated tests will end up getting dragged along on a project for years.  When they break and are either not understood or misunderstood, the bars just get turned green by flipping asserts.  The original intent and knowledge that went with the tests just get lost, and the tests just eat up time.  Even if the intent is clear, tests that break frequently when the app isn't broken, especially from non-functional changes are really annoying.  How can you write tests that will generally break when the code is actually broken, and not just because the code changed? How can you change your design so the code design so that your tests will be less brittle?  How can you simplify your design so that the things you are trying to test can no longer break?
 
For example, if you end up with a massive amount of test setup to control all of the dependencies for your test, something is wrong with your design.  It should trigger your spidey senses.  How can you restructure your design so that you don't have to be concerned about so many things at the same time?  Let your tests help you improve your design.
 
If all you care about is "seeing if it works now", there's no point in checking the tests into source control.  Just delete them. Once you check them in, your tests begin a whole new life, and if you aren't aware of the implications of that, you can create quite an expensive long-term burden.