Quick Online Tips
Home     About     Popular     Cool     Blogging     Downloads     How-to     WP-Themes     Contact

Google Crawls HTML Forms to Index Deep, Invisible Web

April 14th, 2008
ADVERTISEMENTS

Google aims to crawl and index every page on the Internet. But as millions of web pages continue to hide behind flash, javascript, dynamic pages, unlinked pages, and password protected pages – Google has started crawling forms to index the unexplored Deep Web (also called Hidden Web, or Invisible Web) and is going where no search engine has gone before…

Google elaborates on their new adventure to crawl more…

Specifically, when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.

They say only a small number of ‘particularly useful sites’ will be checked like this and Googlebot will always follow the robots.txt, nofollow, and noindex directives. This means that if you have content which you would definitely not want to expose to the world wide web and secure its privacy – be sure to nofollow and noindex it by Meta tags or robot.txt, or else the next time you could be finding it on Google search.

It is often claimed that the Deep or Invisible web is much larger than the indexed web, and now Google is determined to change that…

Liked it? Subscribe feed and keep reading our latest articles for free.
Share:  Digg   Delicious   Stumbleupon   Twitter   Email to friend

Related Posts

  1. NASA’s Deep Impact & Comet Tempel 1 : Deep Space Fireworks on Fourth of July
  2. When Did Google Index Your Site?
  3. Google Joining Standard & Poor’s 500 Index
  4. Topix Crawls 15,000 Top Weblogs
  5. Deep Secrets of Successful Blogging: Free Chitika eBook



Leave a Reply

writeWrite a guest article - Showcase your site to our active community of bloggers, technology experts, geeks and internet marketers. Read guest articles

Site Hosted by KnownHost.

Fully managed VPS Hosting