Quick Online Tips
Home     About     Popular     Photoblog     Themes     Advertise     Shop     Jobs     Contact

Slow Googleblot Crawling: Stop Crashing Shared Hosting Servers

February 16th, 2008
ADVERTISEMENTS

I slowed Googlebot crawling down a few months back. It seemed Googlebot, the Google search bot, was then crashing our shared hosting server repeatedly and my hosting provider Dreamhost intimated that we fix the issues if possible or upgrade to a higher hosting plan. Since private servers are expensive, and my traffic is not spiky (Digg-y), I decided to fix the Googlebot issue myself.

Slow Googlebot Crawling

They blocked Googlebot via .htaccess for me and indicated that it will definitely fix the issue. But that is a temporary solution, since you do not want to block Google from indexing your site forever. But Googlebot had to be controlled since shared hosting services host multiple domain names on the same server (that’s why its cheaper than a dedicated server) and they cannot afford a single customer crashing multiple sites on the same server repeatedly.

Dreamhost wiki has a page about Googlebot behaving badly, the reasons and possible solutions

On a very small percentage of customer sites (less than .01%), the Google crawler will get caught in a loop and it ends up hitting those sites pretty hard. Even if not in a loop, Googlebot will hit every page in your site, so any code that exists, will be run! And if that barely-used code makes things go haywire when accessed.. we have a problem! In these cases it can cause really poor server performance – or even crash your a server. It’s not uncommon for a crazy out-of-control script being hit by Googlebot to use 50% or more of both CPUs on a shared hosting server! When faced with a loady server, we track down the culprit by checking access.logs for activity.

Google webmaster tools is an excellent way to control Googlebot (short of blocking it totally). Here you can view graphs that illustrate the number of pages Googlebot crawled per day, the number of kilobytes Googlebot downloaded per day, and the average time spent downloading a page per day. You can also view the average, maximum, and minimum for this data over the 90 period.

Login to the Webmaster tools, Go to Tools, and click on “Set crawl rate“, an option which lets you see statistics about how often Google crawls your site, and optionally adjust that speed if desired. There are 3 options there, I selected the Slow option…

Slow Googlebot

Once you slow Googlebot down, the change will persist till 90 days unless you manually change the crawl rate again. Googlebot speed will be back to normal in over 2 weeks…

Googlebot Normalizes

I analyzed how much the Googlebot statistics have changed over the last 3 months since late November 2007 when I slowed Google down…

Googlebot Statistics

If you see the number of pages crawled per day or number of kb downloaded per day or time spent in downloading a page, – the charts show spiky and erratic behaviour, but after late November the crawl rate is more stable. Shared hosts like stable crawlers since unstable traffic severely affects load balancing on servers. Has the site traffic reduced? No, in fact it is better than before, but that is possible because more posts and content has been fed to Googlebot since then.

In a few weeks, the crawl rate will automatically switch back to normal crawl rate, but now Google is recommending a faster crawl rate for our site.

Faster Googlebot

Earlier the fast option was not enabled for our site and grayed out. Maybe it was because of our bouncing Pagerank or more Google weightage with new sitelinks on the horizon.

SUMMARY: If Googlebot is causing your cheap webhosting provider from forcing you to upgrade to a more expensive hosting plan, then you can slow down Googlebot and assure your hosting service of a more stable bandwidth use, without affecting your traffic significantly.

RSS Subscribe RSS feed     Bookmark and Share



4 Responses to “Slow Googleblot Crawling: Stop Crashing Shared Hosting Servers”

  1. webmaster says:

    Thanks for the info.
    am already using Google Analytics for my site but
    how can set crawl rate by my own?

  2. Google crawl only 3 pages per day… WTF? I have optimized the site… I have a sitemap.. I wast my time with this stupid googlebot

  3. I to experienced the full force of goog – 5000 + page request a day wtf. Sometimes 10 request a secound. And mysite uses http and databases. Ate my limit up pretty quick. And to beat that goog has asign me a special crawl rate that I cannot adjust unless i personally request it.
    @webmaster – you can sometimes set crawl rate thu the google webmastertools link. The main google settings page, with your gmail and adwords and stuff.

Leave a Reply

  • Subscribe free daily email newsletter Why?
  • RSS   Feed readers   Add to Google Reader or Homepage   Twitter
writeWrite a guest article - Showcase your site to our active community of bloggers, technology experts, and geeks. Now read 100+ guest articles
Jobs
Jobs on SEO | Blogging | SEM | Marketing | Software | More...
Jobs in Google | Yahoo | Microsoft | Adobe | Ebay | Cisco | Intel
Post a job - only $50 for 30 days! | 8 more reasons

Must Buy Software