Saturday, 16 April 2011

A fun day in Budapest

Contentedly enjoying some chat along with a rather excellent Hungarian dessert wine at the BirdLife partner conference in Budapest, I idly checked the BirdGuides news page on my iPhone: it wouldn't load but I just put that down to a hotel wifi connection overloaded with BirdLife partners from 43 countries skyping home. It was a short while later that I stole a quick glance at my email, thereby sealing my fate for the next 30 or so hours. Not only was our web server down, but, chillingly, the duty engineer at our ISP claimed not to be able to find a bootable operating system on it. Website crises habitually fall into two categories: short-term panics that are usually quickly resolved by unflappable engineers, or unfolding disasters of epic proportions. A dreadful sinking feeling told me this was the latter, as I made my apologies and scurried back to my hotel room.

And so it was to be.

Initially our Master Boot Record appeared to be mashed: not a very nice thing to happen to anyone, but fixing it is a routine procedure and most patients go on to live a full and happy life. I suggested to the helpful engineer at our ISP that he might copy off a few vital files for me before patching the MBR "just in case anything went wrong". A short while later he called me back and cheerfully told me he couldn't read any files on the disk, but "not to worry", we'd get them off the most recent backup. Having a long and well-founded distrust of backups, I found it difficult to share his optimism. Hours pass; I can't sleep. Eventually I call to see what the story is: less cheery this time, he tells me that yes they have the backup, but the vital website configuration files are not present. You mean they're missing?!! Yes, but "not to worry", he has passed our case on to the escalation team. Meantime they have worked hard to reinstall the OS and "even upgraded it to the latest version". This a bit like the fire chief telling you that sorry your house has burned down and no, they couldn't save your filing cabinet with the house plans, but don't worry, we've cleared the site and, hey, we even upgraded your septic tank so you're good to go!

So with little sleep and and regretting the glass or two of wine I had so recently savoured, I'm faced with having to rebuild the BirdGuides website configuration more-or-less from scratch. Fortunately Hungarian coffee is just as potent as their dessert wine, and the BirdGuides "escalation team" swings into action. I hang a Do not Disturb sign on the hotel room door. A moment of inspired paranoia a while back prompted me to make a few backups of my own: I remember these, dig them out and get to work. But by lunch time I'm tired and starving and no longer fit to be left in charge of any server, let alone the BirdGuides server. I'm persuaded to take a break, so I wander down to the hotel restaurant and ask if they do a light lunch. The kind waitress tells me she has just the ticket - the Hungarian Business Lunch Special: four courses topped and tailed with an aperitif and coffee. Very, very tempting - but absolutely lethal in my condition, so I have to disappoint her and ask for a sandwich instead, and a very strong coffee. She brings me an aperitif anyhow - I must look like I need it.

Late afternoon, and birdguides.com is emerging from its near death experience. The bird news team is busily catching up with entering the day's news and we're almost ready to go live. Dave and Fiona are frantically testing and making sure all those obscure parts of the website are still functional, all the while blogging and tweeting to keep you all informed. Meantime I'm wrestling with a recalcitrant security certificate. At last the lock symbol appears, and you can confidently buy all those goodies from BirdGuides in the firm assurance that your security is protected by a company called GoDaddy.com. It's time to throw the switch.

Within seconds the traffic on birdguides.com kicks in, and grows, and grows, and grows. It's heartwarming and also rather scary to watch. Will the server cope? Have we forgotten anything? To imagine that so many people have been sitting (im)patiently at their browsers just waiting for this moment! It made my day. I wander down to the restaurant again, but sadly the Hungarian Business Lunch Special is off the menu.

So, as politicians promise after all good disasters, lessons will be learned. We clearly need to improve our resilience. But it is worth saying that our data - your data - was unaffected: it lives on a dedicated RAID array of hard drives and on a separate dedicated database server.

Thanks are due to Adrian at our ISP who toiled through the night to help us, and to my colleagues Fiona and Dave, without whose help and support the whole process would have taken so much longer.

I'm off to the Hortobágyi for a few days and this time I won't be checking my email...

1 comment:

NicoleB said...

Glad you got it all back to Life and can now enjoy beautiful Hungary!
I just 'lost' all my Data due to my own stupidity.
Smaller pain, less data, but I can imagine :)