How to Fix Blocked by robots.txt Errors

How to Fix Blocked by robots.txt Errors

Search engines find information through crawling (which means they request / fetch a URL) to then analyze what they find on the URL.

robots.txt rules should only be used to control the search engine crawling process but not indexing process. This means, most Search Console blocked by robots.txt errors arise due to incorrect rules used by the website owner as opposed to incorrect website setup.

What to Do When robots.txt File is Auto Generated?

Certain Content Management Systems such as Blogger, Google Sites, WordPress Hosted Sites, WiX or Other Platforms may auto generate robots.txt file. In such cases, you can not update the actual robots.txt file if it is auto generated. In such scenarios, the only thing you can do to remedy Page indexing issues is make certain that XML sitemap you’ve submitted to Google search console contains only the URLs you want Google to crawl and index.

Video Lesson for Fixing Blocked by robots.txt Errors

How to Fix URL blocked by robots.txt – Page Indexing Reports

When submitted pages have blocked by robots.txt file issues. You can test and verify which rules in robots.txt file disallow Google crawlers using robots.txt tester tool in Search Console.

robots txt tester search console tool

Once you confirm which rules are blocking Google’s access, then, it is just matter of deleting that line (or lines) of rules in the actual robots.txt file on the web server. Search Console Page indexing reports for Blocked by robots txt

About /robots.txt

The first file search engines like Google or Bing or other law abiding web crawlers look for is the robots.txt file. Webmasters use robots.txt file to provide instructions to user-agents (web crawlers, bots) as to what (if any) part of a website they are allowed to crawl and access. This is done using Robots Exclusion Protocol directives placed in robots.txt file.

This ensures website owners who do not want search engines to access their website can tell ALL (*) bots to NOT crawl their website by simply placing this in the robots.txt file. User-agent: * Disallow: / Since most website owners want to allow Google to crawl their website, using the above disallowing rule is not ideal and should NOT be used unless you are developing a website that is not ready to go live.

But what if there is part of a website that you do NOT want Googlebot to crawl? Then you would do something like this: User-agent: * Disallow: /privatepage

The Reason Website Owners Get robots.txt Rules Messed Up

The most common reason Google Search Console Page indexing reports Blocked by robots.txt issues arise is because a website owner thinks that by using robots.txt they can make control which URLs are visible / indexable to search engines.

When you want web pages not indexed by Google, remove the robots.txt file and use a noindex directive.

By RankYa

RankYa is a digital services provider dedicated to growing your sales and business website's results. Highly experienced technical problem solver, Google products expert with proven 'Social Media Marketing' skills, RankYa (100% Australian Owned and Operated) is dedicated to helping small businesses to grow.

We're looking forward to contributing towards your online success. Contact Us.

7 comments

  1. Great tutorial about robots.txt file. Blocked by robots txt file issues are easily solved.

    Thanks for your great tutorial.

  2. I use wordpress for most of the time and SEO plugin like rank math handles the issue quite well! nonetheless I still want to check if the robots.txt file has xml sitemap link in it. I wouldn’t bother making changes to my robots txt file unless I am confident enough to block any particular directory. Because you may end up blocking any directory that could be harmful like blocking a JS. And you never know unless you are love checking GSC for technical error.
    thanks for awesome info, Like your effort!

    1. Most website owners use robots.txt file thinking the rules within robots.txt will block Google from indexing. But the truth is, robots.txt file directives control access for crawling process, but not indexing process. Meaning, often times, robots.txt file may include URL patterns in hope to control Google access, but if Google can see the URL (perhaps following another link on the website, or directly requesting the URL from an external website) then Google may still index the URL. Hence, don’t even bother using robots.txt file when unsure what it actually does. Learn more here: https://developers.google.com/search/docs/crawling-indexing/robots/intro

    1. Remove what you currently have in your website robots.txt file. Instead just use

      User-agent: *
      Disallow:

      And LEAVE IN PLACE the XML Sitemap links
      Because I think Disallow: /20* maybe causing indexation issues with Google.

Questions? Leave a Comment!

Your email address will not be published. Required fields are marked *