A dialogue concerning two chief IT executives1(possibly on Wall Street):
Simple CIO: Capacity planning? Pfft! You cannot sell prevention.
Sal Viati: Then, explain the multi-billion dollar diet-pill business?
 SOURCE: AP/Ben Margot
A Bridge in Troubled Waters
Whether you're an IT manager, an application developer, or a software engineer, you're probably not involved in the
bridge-building business, but you assume that those engineers who are, would never compromise your safety by cutting
corners on the architectural plan. That may be true for the engineers, but what about their management?
"I can understand people being worked up about safety and quality with the welds, ... but we're concerned
about being on schedule because we are racing against the next earthquake."
This quote2 comes from a Caltrans executive manager who is responsible for the new $7 billion Bay Bridge under
construction between Oakland and San Francisco. An upper-deck section of the existing Bay Bridge collapsed during
the 1989 Loma Prieta earthquake. He is referring to his disagreement with and firing of the independent inspectors
who found cracked welds in some sections of the new bridge. But why would he present such a crazy defense of his decision in public? He doesn't think it is crazy, because the project is behind schedule.
Effectively, he is saying: let's increase the risk that the new bridge will fall, in order to avoid the significantly lower risk
that the current bridge might fall if and only if another Loma-Prieta-sized earthquake should occur before the new
bridge is completed. In other words, risky scheduling is better than safe planning. Or, as the title above says: it's ok
if the new bridge falls, as long as it falls on time.
Smashed Atom-Smasher
In the opening dialog, Simple CIO points out to Sal that it's impossible to sell prevention. So, let's look at the attitude
of management after the $7 billion bridge falls, or since that thankfully hasn't happened, when a similarly priced, $10
billion dollar atom-smasher fails.
The latest director of Europe's new atom-smasher—the Large Hadron Collider (LHC)—says he will be more cautious
than his predecessor, following the very public and expensive failure last September when a section of superconducting
magnets collapsed only days after the LHC was fired up.
"The LHC will be double checked by outside experts before any attempt is made to switch the machine
back on, probably in July. I want to be sure that everything works, so I'll also let an external group make
additional checks on the accelerator."
What a difference a good failure makes: more caution, double checking and independent inspectors are suddenly de rigueur.
Unlike the pre-failure Caltrans director, the post-failure LHC director has clearly been chastened. But why
does it so often take a catastrophe to force management to become wiser? And is it really their fault?
It's the Planning Stupid!
Perhaps such catastrophes only happen in the world of physical engineering and not in the virtual world of IT systems?
Apparently not. Web 2.0 is the current rage in information technology. In 2008 we had a number of high profile failures
there also, each ostensibly due to a lack of capacity planning:
- Twitter.com (a major conduit for many online businesses)
- Amazon Elastic Cloud services
- Cuil.com (the putative Google killer. Remember them?)
- Apple iStore (downloads crashed when 1 million iPhones were sold in a single weekend)
- Google Gmail
So far, this year, we've had major outages at Twitter and Google, all of which has led IT pundit Larry Magid to think
of Google as a single point of failure for Web 2.0, and Silicon Valley luminary Rob Enderle to think private clouds
will dominate public clouds because of these capacity management and reliability issues.
My personal favorite, however, was the launch of the long-awaited semantic search-engine called WolframAlpha.
After months of building up expectations, on launch day (May 15, 2009) Stephen Wolfram sheepishly confessed to the LA Times:
"We ran into a small snag (last night). One of our tests was to use one cluster to simulate traffic and run
it against the other cluster. We found that the throughput degraded horribly."
For me, this begs the question: Why were you only doing that level of load testing the night before launch day? Psst!
It's called capacity planning.
Why is capacity planning still such an oxymoron in the IT industry? No doubt, the management at all these web
sites were prepared to spend serious money on any number of servers to support their applications. So, the capacity
part of capacity planning seems to be understood. It's the planning part that remains unrecognized. Or, to paraphrase
Bill Clinton's immortal line from the 1992 presidential election: it's the PLANNING, stupid!
Brisk Management vs. Risk Management
Why has the planning part of capacity planning not been groked in the IT world? The simple answer is the one Simple
CIO gave in the opening dialog: You can't sell prevention. But that's too simple-minded. Obviously, you can sell
prevention. As Sal Viati rejoined, just look at the fitness business or dietary-supplements business. They're huge! This
suggests that capacity planning would be an easier sell if there was a perceived personal benefit or reward. And there's
the rub.
It turns out that management is caught in a kind of catch-22 situation. Managers, almost by definition, don't believe
the risk of failure is high for their project. That, by the way, is the same strategy Wall Street adopted with credit default
swaps and we know where that led. This "won't happen on my watch" attitude is pervasive because it contains a
statement about risk—perceived risk. But there's a big difference between perceived risk and managed risk.
Managers are employed to look after projects or product schedules. Capacity planning is generally viewed as
something additional that stretches schedules, thereby making projects take longer. Taking longer likely means missing
the market window. Like the Caltrans director, if the schedule is allowed to stretch, a manager will be viewed as having
let his project get away from him, and that means he will have failed as a manager. Most managers would therefore
prefer to have their project be seen as a failure than have themselves be seen as a failure. That's the insane catch-22
logic we are dealing with.
Guerrilla mantra 1.7: Management will let a project fail; as long as it fails on time!3
From another standpoint, this brisk approach to risk management appears justified because, as we all know, time is money.
If the project or product is delayed, revenue will be lost and, what's more, probably lost to a competitor!
Although not a completely false statement, it is false economics. Even if your product reaches the marketplace on
time, according to schedule, if it fails due to a lack of capacity planning, that failure will be exposed and it will be
exposed in the public marketplace, not the test lab. That public failure, in turn, can lead to product aversion on the
part of both existing and potential customers and ultimately those lost sales show up as lost revenue—the very thing
adhering to the schedule was supposed to avoid! The management catch-22.
Back to the Future
Is there anything that can be done to remedy brisk management or bad risk management? It seems to me that corporate
executives could start by being less deferential to the short-term demands of Wall Street. The deleterious impact of
Wall Street on capacity planning arises from the required quarterly reporting period. That alone, has tended to produce
three-month planning horizons, a real oxymoron! Wall Street itself has proven that this strategy simply does not work.
Just like a bridge completed under high-risk scheduling, that strategy has collapsed.
Moreover, in the ensuing economic recession, perhaps more than ever before, companies everywhere are going
to have to become more globally competitive by providing goods and services that are more robust over the long
haul. We're already seeing the impact of the Wall Street collapse on the U.S. automobile industry. Their executive
management became infatuated with short-term capital gains at the expense of preparing for the inevitable, long-term
demand for fuel-efficient vehicles. There's really no excuse based on U.S. customers not wanting to purchase fuel-efficient
vehicles during the past decade. Toyota also knew that, but they still invested in their Prius, and now they're
ahead of the game. That's called foresight. Something Wall Street punishes.
It's clear that we need a new kind of corporate leadership. I'm reminded of the annoyingly redundant corporate
phrase, "Going forward..." Who ever says: "Going backward, we will..."? But maybe they should. Going forward is
only invoked after some corporate catastrophe becomes publicly known. The point of such a phrase, of course, is to
avoid dealing with the failed consequences of bad risk management. The implicit verbal directive is to keep one's eyes
averted from the corpse of the catastrophe; a tactic long used by Wall Street. But it's now clear to everyone that Wall
Street risk mis-management has not only failed, but failed globally. It doesn't get any bigger than that. So, maybe it
really is time to go backward; back to the sanity of doing things the right way instead of the expedient way.
The good news in the U.S.A. is that America has a fine tradition of doing things the right way, but we have to go
back in time. Consider the Johnson & Johnson Credo of 1943:
- Customers
- Employees
- Community
- Stockholders
Notice that stockholders are last on this list; certainly not the order Wall Street likes to see today (although, possibly it
was acceptable back then). It's corporate cultures like Johnson & Johnson that made American companies great in the
past. TeamQuest itself is another American company that does it right because they are privately held and therefore
not subject to the whims of Wall Street. Capacity planning, rightly viewed, helps to provide customers with a robust product,
not simply an expedient product.
Just like the false economy of credit default swaps, the omission of capacity planning in IT has proven to be a
false methodology. For example, the commonly held idea that it's cheaper to over-engineer the hardware architecture
to ensure adequate capacity is patently false. Here's the simple counter-example. If performance testing is skipped
in order to meet the release schedule (and who knows if that's really valid?), and the deployed application ends up
running single-threaded with lousy performance, a boat-load of the cheapest servers from China won't improve that.
This, and other simple-minded falsehoods, should be challenged and avoided as part of a revived corporate IT
culture. Referring back to the opening conversation between our two whimsical CIOs, it is possible to sell prevention
if it is considered important enough. Most people implicitly believe their health is important. They don't need to be
sold on that. The financial health of a company, on the other hand, is determined by the people running it, i.e., executive
management, and capacity planning should be an explicit part of corporate IT fitness from the top down.
The bottom line is not really new. The sagacity of looking beyond the end of your nose is a truism, but incredibly
that truth has been lost in the irrational exuberance4 of false Wall Street economics. A robust economy and IT
customer satisfaction both come from foresight, not just eyesight. In fact, it's the second word in capacity planning.
Neil J. Gunther, M.Sc., Ph.D., is an internationally known computer
performance and IT researcher who founded Performance
Dynamics in 1994. Dr. Gunther was awarded Best Technical
Paper at CMG'96 and received the prestigious A.A. Michelson
Award at CMG'08. In 2009 he was elected Senior Member of
both ACM and IEEE. His latest thinking can be read on his blog
at perfdynamics.blogspot.com
Footnotes:
1 With apologies to Galileo.
2 San Francisco Chronicle, January 26, 2009. The cost could be closer to $10 billion on completion.
3 N.J. Gunther, Guerrilla Capacity Planning, (Springer, Heidelberg, 2007)
4 Alan Greenspan, December 5, 1996. Former exuberant supporter of the Wall Street way.
|