How Google Web Crawler Works

How Google Web Crawler Works

 In FREE SEO Course

Have you ever wanted to know how Google’s web crawlers sees your web pages? Curious about what happens when Google requests a web page? If so, let’s learn about the Crawling process of Google search engine.

Since Google is a search engine for many different media type, it also has different crawlers for different purposes.

For general web search, you can use Googlebot on your web site knowing that Googlebot will honor the directives you place in robots.txt file. For example:

User-agent: Googlebot

Tells Googlebot that it can crawl your entire website. But what if you wanted to tell Google that certain parts of your website shouldn’t be crawled, then you would use these directives.

User-agent: Googlebot
Disallow: /foldernametonotcrawl/
Disallow: /filenametonotcrawlthankyoupage.html

Name of Google’s Crawlers

Crawler User Agent Token Full user agent string (as seen in website log files)
(Google Web search) Googlebot Mozilla/5.0 (compatible; Googlebot/2.1;)
(rarely used): Googlebot/2.1
Googlebot-Image Googlebot-Image/1.0
Googlebot Video Googlebot-Video Googlebot-Video/1.0
Googlebot-News Googlebot-News
Google Mobile (feature phone) Googlebot-Mobile
  • SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/ (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1;)
  • DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1;)
Google Smartphone Googlebot
  • Currently: Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1.4 (compatible; Googlebot/2.1;)
  • Beginning mid-April, 2016: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1;)
Mediapartners-Google Mediapartners-Google
Mediapartners-Google [various mobile device types] (compatible; Mediapartners-Google/2.1;)
landing page quality check AdsBot-Google AdsBot-Google

Understanding the Difference Between Google Crawling Compared to Indexing Web Pages

You can use robots.txt file directives for disallowing Google access to certain parts on your website. However, if Google can somehow find those URL’s (perhaps through your internal linking structure) or through external backlinks. Then Google will still index those URL’s even if you disallowed through using robots.txt file directives.

If this has already occurred for some of your web pages, then visit the  as it explains how to remove URL’s from Google search engine results page.

Knowing that, if you want to control Google’s ability to not index certain web pages on your site, then use this meta tag

<meta name="Googlebot" content="noindex">

IMPORTANT: use the noindex directive only on web pages that you don’t want Google to index. For example: if the web page that I don’t want Google to index is named samplewebpage.html then I would place the above code only on that page and not others.

Here’s a Video Lesson That Explains Google Crawling Process

At the end of the day, whether your website has small number of pages or its a medium to large sized website. Using both robots.txt directives coupled with XML sitemaps and meta tags for indexation control, will allow you to have a better optimized website.

Recommended Posts
Showing 2 comments
  • Jeff

    Love this course.A little suggestion here.maybe it will be better to write a CSS overflow table for the Name of Crawlers heading.
    As I am using my phone to browse the table above does not overflow
    Meaning it’s out if your blog width for mobile device
    Thank you for being generous enough to provide such a wonderful resource.
    Rankya will be my favourite hangout Seo blog from now.
    Thank you once again.

    • RankYa

      Thank you Jeff, you are indeed experienced as CSS overflow for table suggestion is great, much appreciated

Leave a Comment


Start typing and press Enter to search

Yoast SEO Focus KeywordsSearch Engine Google Rankings for Results