What would you do if your data centre blew up?

cjd

Business Member
  • Nov 23, 2005
    16,004
    3,436
    www.voipfone.co.uk
    This evening (last Saturday) at 4:55pm CDT in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room. Thankfully, no one was injured. In addition, no customer servers were damaged or lost.

    http://forums.theplanet.com/index.php?showtopic=90185

    7,400 customers went offline, 9,200 servers went dark and an esitmated 30,000 domains went awol.

    It's still broken.

    What would you do? Do you have a plan?

    (Nice twist - the CEO is called Kevin Hazard)

     

    cjd

    Business Member
  • Nov 23, 2005
    16,004
    3,436
    www.voipfone.co.uk
    Upvote 0

    Subbynet

    Free Member
    Aug 1, 2005
    6,000
    1,101
    45
    Luton
    Wow - I heard about this maybe a week ago, at least a few days anyway. I thought it would have been fixed by now.

    Its also wise to keep hotspare systems ready in case of failure. Ok, fair enough this advice isn't really for Web Hosts, as its just not practical, but if you co-lo host, or have a local server set-up its worth investing in another box and mirroring any changes so that should it die, you can grab the other and plug it straight in.
     
    Upvote 0

    KM-Tiger

    Free Member
    Aug 10, 2003
    10,346
    1
    2,893
    Bexley, Kent
    The only way of mitigating that risk is redundant servers in a different geographic location. Interestingly my wife is involved in exactly that for a Govt Dept, they use three data locations and have automatic failover. As much as I'm allowed to know, it's complex and the testing scripts take best part of a day to run.

    From a business perspective a tricky call, as the cost of implementation could outweigh potential losses. Or could they?
     
    Upvote 0

    Optegris

    Free Member
  • Business Listing
    From a business perspective a tricky call, as the cost of implementation could outweigh potential losses. Or could they?
    Put it this way. 24-48 hours of down time for us if something as catastrophic as the data centre melting would lose a few customers as they would realise it's something out of our control.

    Over a week waiting for the data centre to fix stuff though would most likely put me out of business hence the remote backups.
     
    Upvote 0
    S

    Stonelaughter

    With 2,500 servers across 2 datacentres and myriad remote server rooms, you'd think we'd have an effective disaster plan in place wouldn't you?

    I leave it to your imagination what happened when the floods came last July - and how long it took to recover...
     
    Upvote 0

    Lewcy

    Free Member
    May 18, 2008
    40
    6
    http://forums.theplanet.com/index.php?showtopic=90185

    7,400 customers went offline, 9,200 servers went dark and an esitmated 30,000 domains went awol.

    It's still broken.

    What would you do? Do you have a plan?

    (Nice twist - the CEO is called Kevin Hazard)


    The problem is bigger than that, estimates are that it affects anything upto 1,000,000 websites. The Planet are HUGE in terms of providing servers. Many hosts have their servers located there and sell space on them. The outage has left them struggling to keep clients and will probably see a lot of small business's go out of business.

    As it stands now they have restored power to everything but are still working through servers that havent booted etc.

    It is a nightmare scenario not just for them but the people that use them and havent planned for such an outage.
     
    Upvote 0

    Bay-Bee

    Free Member
    Feb 18, 2008
    10
    1
    My site is one that has gone down, it went down on Sunday, came back Monday afternoon and now has gone again. I am not sure why it came back and now has gone again though.

    I don't know too much about it but I will be looking into things to cover myself if this happens again.
     
    Upvote 0

    Lewcy

    Free Member
    May 18, 2008
    40
    6
    My site is one that has gone down, it went down on Sunday, came back Monday afternoon and now has gone again. I am not sure why it came back and now has gone again though.

    I don't know too much about it but I will be looking into things to cover myself if this happens again.

    You will be on the bottom floor, the top floor is running stable from the onsite generator. The bottom floor had to be powered from a mobile generator which worked for a bit then gave up (technical issues). They had to bring in a replacement which is when your site went down again. It should however be back up now as the replacement generator is up and running.
     
    Upvote 0
    they have restored power
    One indication that it is about a lot more than simply restoring backups. What else outside the servers was damagedin this incident? In any another incident here or elsewhere, what non-server, essential equipment and services could possibly be damaged or disrupted? How many of us would really, genuinely, be prepared to offer worthwhile SLA's when customers are are so very price sensitive? (read the small print)

    Circumstances for each of the contributors here will vary widely and how much the customer is prepared to pay for service plays a huge part in the planning and provision equation.

    It actually underlines the funnier side of some of the threads in these forums. Experienced providers step back and laugh as the BS merchants vie with each other to make competing, ludicrous claims about what they offer. Their skills, knowledge and experience are exposed as they concurrently claim to be the cheapest providers and have the best, i.e. most expensive, kit.
     
    Upvote 0

    Tim R-T-C

    Free Member
    Mar 19, 2008
    548
    64
    The North
    For non-technical end users like me, who just use a website for promotion and contact details, this is a good reason to keep your domain name and hosting separate.

    In the event of a long downtime for your website you can repoint your domain to a different host - just get a cheap or even free one - with a page saying that the website is currently offline but to get in contact please call.... It will at least avoid looking like your company has just disappeared.
     
    Upvote 0
    Funny you should mention this. As part of our final preparations to launch, we ran a series of DRP tests that simulated exactly this scenario. Sadly, the test failed, but that's the point of testing. We learned that our hosting company had not fully tested all aspects of the procedure we are using, so we're now putting pressure on them to fix the problem. We'll be running another DRP test in a couple of weeks' time. As you point out, it's crucial to know your procedures work before going live.
     
    Upvote 0
    F

    Future Freak

    I have several servers with The Planet - one of them is my t-shirt company website and the other is my part time music forums.

    Only futurefreak.co.uk went down at the weekend and finally came back properly on Wednesday.

    Despite this downtime - I only have excellent things to say about how The Planet handled this catastrophe. It was a steady stream of updates on what they were doing on trying to fix the problem, which made the waiting for it to be sorted a whole lot easier. Some UK hosting companies should learn a lesson on customer service from these guys.

    As for backups - after this scare - I now have a regular reminder to grab tar files of all the important data and download a copy to an external harddrive. Figuring exactly what is needed to backup is the tricky bit.
     
    Upvote 0
    Only futurefreak.co.uk went down at the weekend and finally came back properly on Wednesday.
    Which just goes to show that the level of service required depends on the type of website. We couldn't afford to be down for more than an hour, let alone three days. Our DRP strategy would allow us to recover from a complete blow-out at one geographic site, although that worst case scenario could mean an outage of 24 hours.
     
    Upvote 0

    housekeys

    Free Member
    Jan 12, 2008
    2
    0
    London
    Businesses who operate online must plan for the worst. Ask yourselves if your current hosting provide got blown up, how would you get your business back online.

    You should draw up a list of what you need.

    For starters, you would need a backup of all your website pages, whether those are PHP, ASP, HTML or whatever. This includes all your images, javascript files, perl scripts etc etc and you should have these stored somewhere other than with the hosting company.

    If your website is database driven, you would need a copy of the database and any usernames and password needed to access the database.

    A basic hosting account with another hosting company that offers the same technical capabilities as your main provider should be setup so in the event of a disaster, you can upload your website and database to this one.

    Finally, you need to be able to manage your own DNS records. This may be scary stuff, but if your hosting provider does get blown up, and they have managed DNS for you, having your website and database ready on another web server will mean nothing if people can't get to your DR website. DNS management can be as simple as having another domain name associated with your DR website, and forwarding requests to your normal site to the DR name.

    I appreciate not every business will be comfortable with setting this up and this is not the only option (for instance, your hosting company may well have 2 data centres hosted in different towns for DR), but if not, the cost should be relatively cheap, perhaps less than £200 per year, but if your online business is your livelihood you should give serious consideration to finding out how you can do the above.

    At the very least, it is free to ask your current hosting company what their DR policy is.
     
    Upvote 0
    B

    betterlanguages

    I think that the level of solution needed clearly varies with sector and company size. For our business online information, e-mail and local files are all business critical. However as a small business we don't have the capacity to have multiple site backup on dedicated hosting. We mitigate risk by having remote backup of all data on seperate shared hosting (with a decent provider). We also use larger providers for web hosting, and keep copies of everything. It doesn't cost the earth but is pretty secure. (How did we ever survive with just 1 PC in the old days)!

    There are other aspects than just data though. What do you do if you lose your phone line, or internet connection or both? Despite promises of 24 hour repair we lost our BT lines last summer when some fool hit a main cable (alegedly). We were without both for over a week. Big hassle?

    Not really, we have a cable provider with a totally seperate network, so just rebalanced the balanced router, transfered one phone line to tother (simple call to BT sorted this), and carried on working. We got an 8 meg download speed instead of around 18, but didn't really notice much difference.

    So the moral is, don't only think about data, think about means of communication including both broadband and telephone. Of course in the event of a major nuclear attack......but then you probably wouldn't need the connectivity anyway.

    Cheers

    Mike
    www.betterlanguages.com
    website translation
     
    Upvote 0

    DickM

    Free Member
    Oct 3, 2007
    408
    51
    Essex
    http://forums.theplanet.com/index.php?showtopic=90185

    7,400 customers went offline, 9,200 servers went dark and an esitmated 30,000 domains went awol.

    It's still broken.

    What would you do? Do you have a plan?

    (Nice twist - the CEO is called Kevin Hazard)


    Most global corporations' IT operations environments are coded DEV, QA, PROD, DR.
    The DR being Disaster Recovery.
    I can recall that DSG (Dixons, Currys & PC World) suffered a "wipe-out" @ their Hemel Hempstead HQ., December 12, 2005, when blown away by the local fuel depot explosion .......... caused by a careless smoker???
    DSG's remote DR unit swung into action, and HQ's service to their many retail outlets was restored without any disruption to trading.

    My Company's one man band attempt @ DR, is an investment in a 500Gb external disk for £70, which I back up my PC to when company records or accounts are updated (about twice annually!). Also have a couple of other family computers which I can connect the ext disk to, if my one falls over.
    icon7.gif
     
    Upvote 0

    johnfranks999

    Free Member
    Jun 7, 2008
    1
    0
    Run, don't walk, to your local library and get the book "I.T. Wars: Managing the Business-Technology Weave in the New Millennium." It has a complete Disaster Recovery Plan - scalable to any organization. The book actually rebrands DR as DAPR - Disaster Awareness, Preparedness, and Recovery - with an emphasis on prevention - but it's very robust overall. I urge every business person and IT person, management or staff, to get hold of a copy of this book. Our CEO has read it. Our project managers are on their second reading. Our vendors are required to read it (they can borrow our copies if they don't want to purchase it). Any agencies that wish to partner with us: We ask that they read it. Do yourself a favor and read this book - then ask your boss to read it - then ask your staff and co-workers to read it. Read an interview with the author at businessforum dot com - look in the book reviews section.
     
    Upvote 0

    DuaneJackson

    Free Member
    Jul 14, 2005
    8,642
    1,100
    Brighton / London
    We must have got lucky. We have one of our streaming media servers for www.streaming-services.co.uk at The Planet and I wasn't even aware there was a problem until I read this.

    If this happened where the main KashFlow server is located at Blue Square then we could be back up and running, with zero data loss within a couple of minutes. It's cost us a few quid but I sleep a lot easier at nights.
     
    Upvote 0

    Stephen Berry

    Free Member
    Jan 3, 2007
    1,758
    284
    Surrey, UK.
    Lots of good comments so far - I won't repeat any.

    If you want an example of a company which got it right - look up Northgate Information Solutions who were virtually next door to the Buncefield disaster. Had the disaster occurred during working hours they would have had multiple fatalities.

    As it was, they had made all of the plans previous posters have commented on and were up and running so rapidly that no-one missed a salary payment (they pay a huge slice of the public and private sector - including most of the NHS) and (regrettably) the 'average speed' cameras on the motorways were working again straight away - they run the techie side of those !
     
    Upvote 0

    Latest Articles

    Join UK Business Forums for free business advice