Search engines find information through crawling (which means they request / fetch a URL) to then analyze what they find on the URL.
robots.txt rules should only be used to control the search engine crawling process but not indexing process. This means, most Search Console blocked by robots.txt errors arise due to incorrect rules used by the website owner as opposed to incorrect website setup.
What to Do When robots.txt File is Auto Generated?
Certain Content Management Systems such as Blogger, Google Sites, WordPress Hosted Sites, WiX or Other Platforms may auto generate robots.txt file. In such cases, you can not update the actual robots.txt file if it is auto generated. In such scenarios, the only thing you can do to remedy Page indexing issues is make certain that XML sitemap you’ve submitted to Google search console contains only the URLs you want Google to crawl and index.
Video Lesson for Fixing Blocked by robots.txt Errors
How to Fix URL blocked by robots.txt – Page Indexing Reports
When submitted pages have blocked by robots.txt file issues. You can test and verify which rules in robots.txt file disallow Google crawlers using robots.txt tester tool in Search Console.
Once you confirm which rules are blocking Google’s access, then, it is just matter of deleting that line (or lines) of rules in the actual robots.txt file on the web server.
The first file search engines like Google or Bing or other law abiding web crawlers look for is the robots.txt file. Webmasters use robots.txt file to provide instructions to user-agents (web crawlers, bots) as to what (if any) part of a website they are allowed to crawl and access. This is done using Robots Exclusion Protocol directives placed in robots.txt file.
This ensures website owners who do not want search engines to access their website can tell ALL (*) bots to NOT crawl their website by simply placing this in the robots.txt file.
Since most website owners want to allow Google to crawl their website, using the above disallowing rule is not ideal and should NOT be used unless you are developing a website that is not ready to go live.
But what if there is part of a website that you do NOT want Googlebot to crawl? Then you would do something like this:
The Reason Website Owners Get robots.txt Rules Messed Up
The most common reason Google Search Console Page indexing reports Blocked by robots.txt issues arise is because a website owner thinks that by using robots.txt they can make control which URLs are visible / indexable to search engines.
When you want web pages not indexed by Google, remove the robots.txt file and use a noindex directive.