I slowed Googlebot crawling down a few months back. It seemed Googlebot, the Google search bot, was then crashing our shared hosting server repeatedly and my hosting provider Dreamhost intimated that we fix the issues if possible or upgrade to a higher hosting plan. Since private servers are expensive, and my traffic is not spiky (Digg-y), I decided to fix the Googlebot issue myself.
Slow Googlebot Crawling
They blocked Googlebot via .htaccess for me and indicated that it will definitely fix the issue. But that is a temporary solution, since you do not want to block Google from indexing your site forever. But Googlebot had to be controlled since shared hosting services host multiple domain names on the same server (that’s why its cheaper than a dedicated server) and they cannot afford a single customer crashing multiple sites on the same server repeatedly.
Dreamhost wiki has a page about Googlebot behaving badly, the reasons and possible solutions are that sometimes rarely the Google crawler might get caught in a loop and might keep running unwanted code, stressing your server. An out-of-control script being hit by Googlebot can sometimes cause huge CPUs usage on a shared hosting server and crash it!
Google webmaster tools is an excellent way to control Googlebot (short of blocking it totally). Here you can view graphs that illustrate the number of pages Googlebot crawled per day, the number of kilobytes Googlebot downloaded per day, and the average time spent downloading a page per day. You can also view the average, maximum, and minimum for this data over the 90 period.
Login to the Webmaster tools, Go to Tools, and click on “Set crawl rate“, an option which lets you see statistics about how often Google crawls your site, and optionally adjust that speed if desired. There are 3 options there, I selected the Slow option…
Once you slow Googlebot down, the change will persist till 90 days unless you manually change the crawl rate again. Googlebot speed will be back to normal in over 2 weeks…
I analyzed how much the Googlebot statistics have changed over the last 3 months since late November 2007 when I slowed Google down…
If you see the number of pages crawled per day or number of kb downloaded per day or time spent in downloading a page, – the charts show spiky and erratic behaviour, but after late November the crawl rate is more stable. Shared hosts like stable crawlers since unstable traffic severely affects load balancing on servers. Has the site traffic reduced? No, in fact it is better than before, but that is possible because more posts and content has been fed to Googlebot since then.
In a few weeks, the crawl rate will automatically switch back to normal crawl rate, but now Google is recommending a faster crawl rate for our site.
SUMMARY: If Googlebot is causing your cheap webhosting provider from forcing you to upgrade to a more expensive hosting plan, then you can slow down Googlebot and assure your hosting service of a more stable bandwidth use, without affecting your traffic significantly.