Surviving a Provider Outage

Last year, EIG (or the Endurance International Group) suffered a major outage in one of their facilities in Utah, impacting a number of customers on their Bluehost, Hostgator and Hostmonster brands, possibly among others. Today they have been down for close to 6 hours and counting leaving customers with services ranging from a small shared hosting site for a family all the way up to dedicated server customers running business sites and services all offline with no resolution in sight.

So how do you, as a website owner or a service provider relying on other providers such as Bluehost being online, go about keeping your business-critical server online and functioning? The key is forward planning.

Finagle’s Law of Dynamic Negatives states that Anything that can go wrong, will—at the worst possible moment. Like your computer blue-screening right before you hit save (or because you hit save) and losing all of your work since the last time you saved. Or the power going out as the concert starts. You can imagine many, many more. Quite simply, if you are relying on absolutely anyone else to help provide your critical systems, chances are one of them will fail at some point and leave you stranded for a period of time.

So, forward planning. You know that something is going to fail so much like financial investments you need to diversify your business service portfolio, as it were.

Start with Reliable. Choosing your provider is important, it’s a crucial balance that you should probably reconsider as your business needs and abilities change. The saying goes – Good, Cheap, Reliable: Pick Two. You might be able to get a good host for cheap, but it won’t be reliable. If you want a Good, Reliable host, you’re going to have to pay a little more. In any case, do your research and ask around – don’t just pick one with flashy ads on Youtube.

Consider a Disaster Recovery (or DR) environment. We know that no matter who you choose as your primary provider, they’re going to have downtime. It might be for maintenance (in which case they should let you know ahead of time) or it might be due to an unexpected failure of some kind. Some are relatively minor and only impact one customer (like a part in your dedicated server fails) or a handful of customers (a switch or power distribution unit fails). If might be something massive like the core routers losing connectivity. Your business is critical, so it’s worth investing in an environment that your services fail over to when the primary is unavailable. It can be as complex as a full hardware and software replication of your production environment, or it may even share some of the load during regular hours. Or it might be as simple as a cheap virtual server, everything might run a little slower but it’s enough to help you ride out the storm and gives you somewhere to migrate your critical functions.

Backup, Backup, Backup. Maybe you can’t afford a DR environment, keep backups of everything. If your service provider went bankrupt and simply shuts down, or as we saw with Volume Drive last year just up and leaves their colocation provider and some servers just “go missing” – how will you move on? You need a backup of your system so that when you can select a new provider it becomes a relatively painless process of deploying your service again.

Service credits don’t cover the cost of your lost revenue, and just because they offer a 99.999% guarantee doesn’t mean they’ll spread that 0.001% across the calendar year. It’s a critical item that needs to be considered when planning your IT strategy. After all, anything that can fail, will. And probably at the worst possible time.

Leave a Reply