Dreamhost Server Crash Analysis

By Posted 2009 Updated   BloggingDomain HostingSEOSite NewsWordPress

Our Dreamhost server crashed and our site has been offline for a large part for the last 2 days! This again gives us an opportunity to share what happened and some tips we learnt from this experience.

Internal server errors were on for last few weeks on our server, and we learnt more about memory intensive plugins and the need to upgrade from PHP4 to PHP5. Though mostly the server errors were fixed, it was short lived as our shared Dreamhost server crashed.

The web server, bulger, is having issues with its raid and attempts to move the drives over to a new server have also failed. We’re currently bringing up a new server and restoring it from backups More information to come!

Update: Further hardware disaster with this machine has extended downtime, unfortunately. We are still working on it and we hope to have it back up soon. We are very sorry for the inconvenience.

Update 06:35 PDT: The machine is back up now but is still restoring from backups and some configurations are still running on the server. Websites should be up for the most part but if theyre not, dont fret as we’re still working on it! We’ll post more updates as they come in!

Update 8:11 PM PDT: The server is still restoring from backups but sites should be online. You will notice a bit of instability meanwhile all the backups are restored but this is only temporary and subside once everything is restored. We’ll post the all clear once everything is good to go!

There has been no update for over 24 hours and it seems its not all clear as yet. This time we could not maintain traffic like before and no cached pages were functional and the entire site was down. There was no FTP access and even the homepage notice could not be posted.

server crash

Thankfully maintaining a status blog helped as an alternative Blogger blog could send out updates. Using Feedburner as the primary feed url helps to redirect the status blog feed if required. Those who follow us on Twitter, could stay connected about recent status updates from our blog. It was great that we use Google Apps to run our email, so email communication was not affected. All these tools helped to maintain contact and keep some services running.

I had a few email discussions with Dreamhost support over the last 2 days, and here is an email from yesterday highlighting what the problem was.

Thanks for writing in! I am *VERY* sorry for the inconvenience! The server that you are on, bulger, has been having a few problems as of yesterday. I’ll go ahead and explain what happened. Basically, bulger was having issues with its RAID setup and the server was going up and down intermittently. Our network admins went ahead and decided that we needed to fail the existing hardware over to new hardware in order to restore functionally to the server! The new hardware has been installed and the server is running BUT the I/O from the rsync of backup data is causing problems for people on the server. I don’t have an ETA for when the rsync will be done but we’ll get this squared away as soon as possible! I really do apologize for the inconvenience and I totally understand how frustrating a server being up and down is. That is why our network admin team is working hard to bring it up now.

And another support email today morning…

We’re very sorry for the problems that you are having with your site. The server that it was hosted on was having some hardware issues, and we tried to do a quick move to get it setup on new hardware with little downtime. Unfortunately, things didn’t go as quickly as planned and it is still having issues. We do have the hardware issue all straightened out, but we are still in the process of copying data over from our backup servers. This is causing the network of the server to become extremely saturated and is causing the server load to wildly fluctuate. At this point, we don’t see anything wrong with the server that is making your sites unavailable other than extreme load that these backups are causing on the server. Unfortunately, there’s not a whole lot we can do with this server until after the data transfer is complete. All we can ask at this time is that you keep an eye on this status post to see when the issue is resolved.

Thankfully the site traffic is creeping back for the last few hours and the FTP and wordpress admin has started working.

server online

Though Yang let me know about the Internet Supervision tool, which shows the site is still loading very slowly across the world at over 40 seconds.

slow-loading

As I considered the Dreamhost PS private servers upgrade, many helpful Twitter users (like @nirmaltv, @manikarthik, @sumesh, @rishi, @keithdsouza, @shivaranjan, @denharsh) pitched in to suggest alternative VPS hosting solutions like Slicehost, Doreo, Linode, Mediatemple and it was great to see some hosting companies offering their services to us on Twitter like @linode and @spiralhosting.

I hope Dreamhost fixes the server issues soon and posts an update to affected customers … as someone mentions in comments “Information aids patience; silent downtime breeds discontent.”


6 comments on “Dreamhost Server Crash Analysis

  1. mookie says:

    i recently went though a nightmare with dreamhost (on and off bouncing of mail over a four week period) and finally got fed up and left. i second the votes for linode. i moved my stuff over to linode and have been happy ever since. good luck.

  2. Yang Yang says:

    Thanks for the link!

    DreamHost has been a rather decent host up until recently. I’m becoming curious why they have lost their mind business-wise. Any web hosting company nowadays, especially ones as successful as them should be profiting incredibly well. Isn’t it obvious now that they should invest more in hardware and technical staff to better the experience of the clients and ensure for the future?

    At least they should take good care of guys like you. For instance, although there are all kinds of negative reviews about media temple grid service on the Web, the big guys who are mostly leaders in the web design / development industry such as zeldman.com and simplebits.com are all very well hosted by them, and who are thus more than willing to recommend media temple to their readers.

    Hope they are now beginning to actively seek solutions in regards to their customers like us.

  3. ThePicky says:

    The common dreamhost shared hosting nightmare errors are “500 Internal Server Error”, “Service Temporarily Unavailable”, “Premature end of script headers: index.php” and “Site slow”

    Because of this, from dreamhost shared hosting, I moved my 3 sites to mediatemple ( Media Temple Grid-Server has GPU usage issue). Now with eleven2 without any issues.

  4. Harsh Agrawal says:

    Good to see your blog is back, I hardly recommend go for some VPS , this way you can minimize the downtime and can always check for which script is creating issue.

  5. Zeke says:

    Don’t tell this blog was on shared server all the while?

  6. cc says:

    DO NOT GET MEDIATEMPLE!!! our sites have been down for 26 hours now! they are way overpriced and downtime occurs all the time. especially at lunch, which is crucial for e-commerce site like ours. They are horrible, and i just switched over to AN Hosting. Hope they dont suck as bad, but who knows. Sounds like you should try vps is your site is busy e-commerce.

Leave a Reply

Your email address will not be published. Required fields are marked *




css.php