How to Fix Blocked by robots.txt Errors

How to Fix Blocked by robots.txt Errors

Search engines find information through crawling (which means they request / fetch a URL) to then analyze what they find on the URL.

robots.txt rules should only be used to control the search engine crawling process but not indexing process. This means, most Search Console blocked by robots.txt errors arise due to incorrect rules used by the website owner as opposed to incorrect website setup.

What to Do When robots.txt File is Auto Generated?

Certain Content Management Systems such as Blogger, Google Sites, WordPress Hosted Sites, WiX or Other Platforms may auto generate robots.txt file. In such cases, you can not update the actual robots.txt file if it is auto generated. In such scenarios, the only thing you can do to remedy Page indexing issues is make certain that XML sitemap you’ve submitted to Google search console contains only the URLs you want Google to crawl and index.

Video Lesson for Fixing Blocked by robots.txt Errors

How to Fix URL blocked by robots.txt – Page Indexing Reports

When submitted pages have blocked by robots.txt file issues. You can test and verify which rules in robots.txt file disallow Google crawlers using robots.txt tester tool in Search Console.

robots txt tester search console tool

Once you confirm which rules are blocking Google’s access, then, it is just matter of deleting that line (or lines) of rules in the actual robots.txt file on the web server. Search Console Page indexing reports for Blocked by robots txt

About /robots.txt

The first file search engines like Google or Bing or other law abiding web crawlers look for is the robots.txt file. Webmasters use robots.txt file to provide instructions to user-agents (web crawlers, bots) as to what (if any) part of a website they are allowed to crawl and access. This is done using Robots Exclusion Protocol directives placed in robots.txt file.

This ensures website owners who do not want search engines to access their website can tell ALL (*) bots to NOT crawl their website by simply placing this in the robots.txt file. User-agent: * Disallow: / Since most website owners want to allow Google to crawl their website, using the above disallowing rule is not ideal and should NOT be used unless you are developing a website that is not ready to go live.

But what if there is part of a website that you do NOT want Googlebot to crawl? Then you would do something like this: User-agent: * Disallow: /privatepage

The Reason Website Owners Get robots.txt Rules Messed Up

The most common reason Google Search Console Page indexing reports Blocked by robots.txt issues arise is because a website owner thinks that by using robots.txt they can make control which URLs are visible / indexable to search engines.

When you want web pages not indexed by Google, remove the robots.txt file and use a noindex directive.

By RankYa

RankYa digital marketer, website optimizer, content creator, and a fully qualified web developer helping businesses of all sizes achieve greater results online. Based in Melbourne Australia RankYa serves valued clients worldwide by providing personalized services.

We love sharing our proven experience through how to videos and complete courses related to business website marketing, conversion optimization, Google (Search Console, Ads, Analytics, YouTube), SEO, HTML5, Structured Data and WordPress. Thank you for visiting our blog.

Questions? Leave a Comment!

Your email address will not be published.