The site was down for 36 hours over the weekend as our web hosting provider Dreamhost disabled all sites on the account for overuse of CPU minutes. After multiple unclosed support tickets went with no response… I knew I had to fix it myself and here is how I proceeded …
Why was the Site Down?
They had sent me an email for overuse of CPU minutes, which basically meant this site was using much more server resources and stretching the MySQL database more than it optimally should, which compromises functioning of other sites hosted on the same share hosting server. So hosting technical support followed the right course of action – better disable one site, rather than crash the server and make many other sites go down. So all pages were showing 403 Forbidden errors. Here is the email excerpt –
A normal user utilizes under 75 CP, a heavy user utilizes 75-100 CP, and normally problematic users utilize 100-150 CP. You, are utilizing well over double CP minutes as a problematic user would.
It looks like your site(s) have outgrown the shared hosting environment You should look into upgrading to a Private Server ) ASAP. Or if possibly find out what is making your code over utilize so much resources ) and put this to a stop before re-enabling your sites.
Please respond to me in regards to this issue, so I can verify that you’re taking a pro-active approach to solving this problem. If you re-enable your sites without writing back, you may be in danger of forcing me to disabling your hosting account.
I thought that it was a stumbleupon traffic spike for our domain management article or 9rules entry, but the traffic stats showed that it was not the case, and the cause was elsewhere.
Searching Alternative Web Hosting
The long downtime also got me thinking about alternative hosting solutions like upgrading to Dreamhost PS (these are the Dreamhost Private servers where I would have to pay around $30 per month extra for 300MB / 300mhz private server, with lots of additional advantages, but its on a few weeks waiting list). I was suggested by readers to look at Mediatemple Grid Servers, Liquidweb VPS, Slicehost VPS hosting too.
But after QOT survived the BBC effect easily, it had to be something else that was causing server load to increase. I love Dreamhost hosting which has provided a good experience over 2 years and I would not abandon it for a little downtime.
How do Tech Support Disable Websites
Over the years, I have learnt some strategies that web hosting services adopted to disable this site.
– Rename the Domain folder – They simply rename the domain folder via FTP to something else like domainname.com.old and your site goes offline. Its simple to rename it back via FTP and get it online.
– Block via .htaccess – A simple tweak in .htaccess file can change permissions needed to access the server and can disable the site. I had the opportunity to explore this option when DH blocked Googlebot. Simply edit your .htaccess in any text editor (after you enable viewing hidden files in your FTP client – I use Filezilla) and you are done.
This time both these strategies were not applicable as domain folders were ok and .htaccess was not tampered. However, a friendly DH tech support rlparker helped me out in DH forums and over PM gave me the most useful tech advice ever (with such immense clarity!) that this site is back online again. Much of what I could attempt here is thanks to his guidance.
How to inform the readers?
We maintain a QOT Status blog on Blogpsot, which serves a communication medium when the site is down. I simply redirect this status blog feed via the main QOT feedburner feed and it reaches the 15000 feed readers instantly. I could also use Yahoo Pipes to mashup multiple feeds like that of our tumblelog (which survives because it is hosted on tumblr via a DNS modification) and get some alternate content in with site updates. I could continue doing this till the site was down.
How was the site disabled?
The directory permissions at root were edited such that the site was inaccessible to users, but accessible to me such that I could fix things and identify the causes. This was good and a very sensible move because now you can check you logs, remove corrupted scripts, remove plugins, identify other causes of CPU over overload and fix them yourself. After fixing all these issues, you can now activate the sites yourself too.
How did I Get the Blog Online?
It was clear I needed to alter the directory permissions to let in users to get the site online. But I first had to fix the cause of CPU overload, as activating the site without any corrective measures would again mean a sudden burst on server load, possibly crashing it and sending other sites offline, which would have invited more severe disabling measures from Dreamhost.
This forced me to learn more about Shell and SSH, which was what I needed to fix this issue and I wanted to do it right the first time. First I need to change my Dreamhost user account to enable Shell access. In dreamhost dashboard, Go to User > Manage users > Edit
Then I grabbed PuTTY, a free SSH, Telnet, rlogin, and raw TCP client to connect to the server and found some settings for PuTTY that work well with DreamHost. I read lots of stuff about using SSH and Unix commands, the Linux BASH command line, UNIX file permissions and this amazing chmod tutorial.
First I needed to identify the causes of heavy usage, and so I accessed ~/logs/yourdomain.com/http and typed
cat access.log| awk '{print $1}' | sort | uniq -c |sort -n
for last 10000 hits, use
tail -10000 access.log| awk '{print $1}' | sort | uniq -c |sort -n
this revealed the IPs hitting the domain the most and we identified thousands of hits were coming from 66.249.67.132. Upon typing host 66.249.67.132
, we identified it as googlebot.com
So how do you block googlebot? Googlebot behaves badly sometimes and it is simply blocked by adding this line to .htaccess on your domain top folder
<Limit GET HEAD POST>
order deny,allow
deny from 66.249.67.132
</LIMIT>
You can block 66.249 only to block all Googlebot IP’s. This is a temporary measure as you finally want Googlebot to index your site. You can also slow Googlebot crawl and help your shared hosting. Now we could restore our websites back. Go to the top “.” folder using PuTTy (cd ..), and “chmod” appropriately to get the file directory permission as drwxr-xr-x and the sites went live instantly.
What Dreamhost Should have done?
Instead of disabling the site, they could easily identify the cause of increased CPU usage as Googlebot (its very simple via log analysis), inform the site owner and block Googlebot via .htaccess and that takes care of it all. And of course respond faster to support tickets.
I had to read a lot of stuff about Shell, UNIX and SSH to attempt what I did, and now I have become much wiser and well versed with what seemed cryptic a day ago. And this was truly a learning experience.
NOTE: I am not an expert in technically managing hosting servers and site crashes. This was my personal experience in managing this event to bring the site online. The measures suggested below need expert knowledge and if improperly used can harm your site data and functioning irreversibly. I take no liability for your misadventures. Its always best to seek professional help before attempting any such actions.