Wednesday, May 19, 2010

Preventing network outages

Outages for major networks and websites have become so common-place, there
are even sites dedicated to reporting them.

For example, the recent Youtube outage caused a flurry of tweets. This
interesting posting details a technique to use Google's timeline feature to track
outages of major sites. Using that technique, one can search for "network
outage", and track down a Playstation 3 network outage starting February 28
that lasted for a couple of days.

Rarely does one get to know the specifics of a particular outage, such as
root cause and duration. When details are reported, one has to be skeptical
as to the veracity of the report. Regardless of the cause, what is interesting
is the relation of the cost of downtime caused by outages vs. the cost to prevent them.

The telephone companies calculated that relation a long time ago when they
instituted the five nines metric of uptime. They figured they could live with
the cost of 5 minutes downtime a year, and spent the money on resources to get
there.

When you plan for spending resources on your network reliability, ask yourself
how long a downtime you can live with first, then allocate the resources in
man-power, equipment and software to achieve the desired reliability.

SNMP simulation software allows you to test your network management
procedures before network outages occur.

No comments: