My latest posts can be found here:
Previous blog posts:
Additionally, some earlier writings:
Graceful Degradation - 2014/12/12
I first learned about graceful degradation from a colleague. He prefaced his story by saying that good people learn from their mistakes, but the best people learn from other people's mistakes. This is a bit like the saying in aviation circles that a good landing is one you can walk away from, an excellent landing is when they can use the 'plane again ...
But I digress. He told me how he learned about graceful degradation, and the message was clear. He was telling me about his mistake so I could learn from it. I wish more people were willing to make their mistakes public - the software industry would certainly benefit if it could learn from everyone's mistakes.
And his story goes, in broad outline (I forget some of the details) like this.
He made a comfortable living from programming electronic tills. At the time these were programmed in assembly language, and the best output you could hope for during debugging was to see whether it opened the till or not. It was difficult work because there were no simulators, no higher level languages, no debuggers, no single-steppers, no help at all, really. You wrote you code in assembler, transferred it to the machine, then tested it to see if it worked. Sometimes it did.
My colleague was amazing at this work, which is why he could get contracts in it any time he wanted, and the money was good. So he worked about six months of the year, then did whatever he wanted the rest of the time.
Except at Christmas. That was when one of his biggest mistakes would come back to haunt him.
A large department store in a major capital city used tills that ran his program. It was one of the most sophisticated systems around. Each till would keep a list of items sold, and then a central computer would interrogate each till in turn and find out how much had been sold of each item. It would then keep track of how many remained in stock, and would project when a re-ordering would be necessary. Re-ordering was not automatic, but staff would be alerted when stock was low on any items and decisions could be made.
They loved the system. Buffer stock could be reduced, reducing the store's overheads, and making the entire store more responsive to consumer demand.
Except at Christmas.
You see, every Christmas the system, in essence, would simply stop. An entire floor of tills would stop responding, refusing to accept further purchases for an unpredictable length of time, and then suddenly start working again. There was no apparent reason, no apparent rhyme, and nothing they could do except call my colleague and get him to come in and "fix the problem."
But there was really nothing he could do either, even when he worked out what the problem was. And what was the problem?
Each till would keep its list of items sold, and then when asked, would
dump the list to the central computer. Then it would go back to
Entire floors of tills would grind to a halt, waiting to download their data and restart. The floor would simply stop. Not good at the busiest shopping time of the year. Really not good.
So what should have been done?
Of course it would have been better if the system had simply been faster, but when overloaded, degrading gracefully is usually a better option than simply stopping. Think about it next time you worry about system capacity.
Is it better to halt and clear the backlog, or degrade gracefully and continue to serve your customers?
Send us a comment ...
Links on this page