SOLVED: Google Search Console Reports: Crawl Blocked by Robots.txt

If you notice your web traffic decrease substantially in a short period of time, you should see what Google thinks:

  1. surf to Google Search Console (formerly names Google Webmaster Tools)
  2. click URL INSPECTION
  3. in the INSPECT ANY URL bar type in your root domain (i.e. https://www.URTech.ca )

If you see the following, you have a serious problem:

Indexed, though blocked by robots.txt
Crawled as Googlebot desktop
Crawl allowed? No: blocked by robots.txt

Page fetch Failed: Blocked by robots.txt

HOW TO CHECK YOUR ROBOTS.TXT FILE?

A robots.txt file is a plain text file in the root or your site that tells robots (i.e. Google’s search bot) what it should be looking at and what it should not be looking at.  In fact most sites do not need a robots file anymore because:

  1. Robots.txt is only a SUGGESTION to bots.
    • Malicious bots will ignore it
  2. Google, Yahoo, Microsoft and other bots already know what to index and what to avoid on most websites
    • For instance, GoogleBot is smart enough to ignore WordPress readme files and the WP-ADMIN folder by default without Robots.txt telling it to skip them

If you want to see your Robots.txt file from your browser right now, just surf to <your domain>/robots.txt  For instance, if you want to see URTech’s robots file, just surf to https://www.urtech.ca/robots.txt .  As you can see it is wide open for everyone and every bot to read… or ignore.

If you see something like:

Disallow: /wp-admin/

your Robots.txt file is telling GoogleBot to index the entire site EXCEPT the items in the WP-ADMIN folder.

If you see something like:

Disallow:

which does not specify what to disallow, bots will not index your entire site and your traffic will likely grind to a halt.  You need to correct this immediately.

HOW TO EDIT A ROBOTS.TXT FILE:

Robots.txt is just a plain text file and can be easily edited through your web hosting companies file manager or using an FTP product like FileZilla to view (and edit!) the files on the server that is hosting your site.

If you are not sure how to do this, just call your web hosting company and they will walk you through the easy steps in just a few minutes.

I MODIFIED MY ROBOTS.TXT FILE BUT IT IS NOT SHOWING THE CHANGES

If you delete (a good first step) or you modify your robots.txt file but find that when you surf to <your domain>/robots.txt it has not updated, you have a problem.

It is possible that the file is just cached so you should clear your browser cache, clear your websites cache (i.e. you may be using a performance accelerator like WPSuperCache) and possibly even your Content Delivery Networks cache (i.e. we use CloudFlare to replicate our site globally and provide additional security, but most sites don’t use any).

If caching is not your problem, you HTaccess file is likely redirecting requests for Robots.txt to a different location and that means you are hacked.

MOST LIKELY YOU HAVE BEEN HACKED; NOW WHAT?

WHAT IS AN HTACCESS FILE?

If your website is like most, it will be hosted on a Linux server running Apache Web Server.   The .HTACCESS file contains your sites core configuration.  We can explain more, but if you really care THIS article from WordPress explains htaccess very well.

When someone goes to your site, before it does anything else, Apache will read your .HTaccess file and that is likely hacked.

HOW TO VIEW AND EDIT MY HTACCESS FILE

Take a look at your HTaccess can be easily edited through your web hosting companies file manager or using an FTP product like FileZilla to view (and edit!) the files on the server that is hosting your site.

If you are not sure how to do this, just call your web hosting company and they will walk you through the easy steps in just a few minutes.

HOW TO TELL IF YOUR .HTACCESS FILE IS HACKED

If you open your .HTACCESS and find “Rewrite” instructions like the following, you are likely hacked:

RewriteEngine On

RewriteBase /
RewriteCond %{HTTP_USER_AGENT} (google|yahoo|msn|aol|bing) [OR]
RewriteCond %{HTTP_REFERER} (google|yahoo|msn|aol|bing)
RewriteCond %{HTTP_HOST} urtech\.ca$
RewriteRule . check-caveat.php [L,S=10000]

This is almost plain English.  You can see that if Google, Yahoo, MSN or Bing are sending traffic to your site, it is being redirected and that is bad.

The simple thing to do is just delete those instructions from your .HTACCESS but that file contains a lot of cryptic commands that most people will not want to risk playing with so the easier thing to do is just to replace that file with a backed up version.  If you don’t have a backed up version of that file, your webhost probably does.

In our case, we have plenty of back ups but we are hosted with GoDaddy so we used just used their File Manager to restore an .HTACCESS file from a few days before we thought we were hacked.

We then resubmitted to Google via the GOOGLE SEARCH CONSOLE and bingo, our traffic returned and we were happy.

View Comments

  • Actually I do not agree that Google is smart enough not crawling wp-admin folders (btw you have a typo in folders ;-)). I often have the issue that google is crawling post which I am still working on. So I have some weird issues in google search console with very long urls in exactly that folder. Although I have tried to exclude it in the robots.txt Google doesn't honour it. Not sure why.

Published by
Ian Matthews

This website uses cookies.