Technical SEO

Crawl Errors

Shahid Maqbool

By Shahid Maqbool
On Mar 31, 2023

Crawl Errors

What are Crawl Errors?

Crawl errors occur when Users, Bots, Google, or another search engine tries to access your website pages – but fails. It prevents search engines from indexing your website.

Without indexing, your website pages will have no chance of ranking. 

Types of crawl errors

Google Search Console has divided the crawl errors into two main categories.

Site Errors: These errors affect the entire website.

URL Errors: These errors affect a particular URL.

Site Errors

These high-level errors can affect your entire website and prevent search engines from accessing it, so pay attention to them.

There are three main types of site errors.

DNS Errors

DNS (Domain Name System) translates the domain names from human-readable to machine-readable form.

DNS errors occur when a search engine fails to connect with your domain due to a DNS timeout or lookup issue.

These errors are critical that prevent Google from communicating with your website.

However, Google claims, despite DNS errors, it can still access your website.

DNS issues must be addressed quickly, as it is the first step to getting access to your website.

For example, if you see any crawl error in your Google Search Console, it means Google has tried to access your website multiple times but failed. 

There are two main types of DNS errors:

DNS timeout: occurs when your DNS server fails to respond to Google quickly within a specified time.

DNS lookupThis occurs when the DNS server fails to find your website and, as a result, prevents Google from accessing your website.

How to fix DNS errors?

Before addressing any issues, check Google Search Console to see how it works to crawl your website.

Use the "Fetch as Google" option on the left side of Google Search Console to see how typically Google crawls your pages. 

If you only want to see the status of your DNS connection, use the "FETCH" option alone.

Use the "FETCH AND RENDER" option to understand how Google sees your page. It will tell you about multiple issues, from hacked pages to hidden elements.

The issue can also be at the DNS provider's end. Contact your domain provider if Google cannot Fetch and Render your website.

Server Errors

Google bot allocates a specific amount of time to load your website. If it fails to access it in that specified time - it will give up.

Server errors occur when your server takes too much time to respond and ends up with a message "Request times out".

It happens when there is a flaw in your website's coding or the server cannot handle all the requests because of too many visitors.

There is a clear difference between server and DNS errors. In DNS errors, Google cannot reach the website, but in server errors, Google establishes a connection with your website but cannot load the pages.

How to fix server errors?

Just like DNS errors, server errors must be resolved immediately. If Google cannot connect with your page, it will give up after a specified time. You may use Google Search Console to fix this error.

Use the "Fetch as Google" option in Google Search Console. If it returns to your homepage without showing any errors - that means everything is fine on your website.

There are several other types of server errors, so a different approach will be used according to their kind to fix those errors.

The most common types of server errors are:

  • TimeoutOccurs when the server takes too much time to respond to search engine bots, and the connection times out. 

  • Truncated HeadersOccurs when a bot connects but closes the connection because of not receiving the full server response headers. 

  • Connection ResetBecause of the connection reset, the Google bot fail to get the results even after processing the request by the server and connecting to your website.  

  • Truncated ResponseThe end of connection because of not receiving a complete response.

  • Connection Refused: This occurs when a server refuses to connect with a bot.

  • Connect FailedThis occurs when a server is down or the computer is unable to connect with the server.

  • Connect TimeoutThis error occurs when a bot fails to establish a connection within a specified time set by the system.

  • No ResponseThis error occurs when web servers are not communicating quickly, and the connection ends before sending any response.

Robots Failure

When a robots.txt file is not configured correctly, it can result in a "Robots failure".

This happens when Google (or any other bot) tries to crawl your website and is blocked from accessing specific pages.

So, it's crucial to ensure that your robots.txt file is set up correctly to avoid potential issues.

Usually, this file is added to the root directory of your website. The purpose of this file is to tell Google which pages it should avoid and which ones it can crawl.

If you want the search engine to crawl your entire website, uploading this file is not necessary.

Instead of configuring a poor robots.txt file – it is good to avoid it completely.

To learn more about robots.txt and how to configure it properly, read our in-depth article on Robots.txt.

How to fix it?

If you have not configured your website's robots.txt file correctly, it will be hard for the bots to find your website pages and index them.

Carefully look at the directives of your robots.txt file. If it looks like this, it will stop all search engine bots from crawling your website.

User-agent: *

Disallow: /

If you want to allow all search engine bots to crawl your entire website, make sure to remove / from the file like this:

User-agent: *

Disallow: 

This is quite an important issue, so instead of making any changes manually, get help from a professional.

If you are still facing robots crawl issues even after configuring the robots.txt file properly - contact your website hosting company. It may be due to some invalid setting or server overload.

URL Errors

URL errors affect specific URLs or pages, not the entire website. Bots will face crawling issues if there are URL errors on a website. These errors are easy to fix with the help of Google Search Console.

Go to the "Coverage" section, which will show the list of all the URLs with errors from most important to least ones.

There are the following main types of URL errors:

Soft 404

Soft 404 errors occur when a page should have a 404 error, but it is showing up as a 200 error.

It can be due to the following reasons:

  • When a page is empty or has no content on it

  • Visitors are redirected to irrelevant pages via nonexistent pages

  • A page was deleted without using the 404 response code

How to fix soft 404 errors?

If you want to avoid soft 404 errors, make sure there is enough content on your page.

If necessary, use 301 redirects to lead the users to the relevant pages. Avoid redirecting many dead pages to your homepage; instead, use 404 or redirect to the appropriate related pages.

Ensure that the server header response is 404 or 410 instead of 200 for the pages that have been removed.

404 (Not found)

This error occurs when a Google bot is unable to find the content of a page on your website.

According to Google's official statement, 404 errors are not deteriorating unless they appear for meaningful and valuable pages.

404 errors can be ignored if these are for less important pages because they will not affect your rankings – however, you still need to fix them.

How to fix 404 errors?

These errors can be both internal and external. If the link exists on your website, it is internal, and you can fix it with your developer's help.

If it exists outside your website, it is external – and you can fix it with 301 redirects.

Use Google Search Console to see which internal and external links point towards 404 errors.

If a page is dead, make sure to bring it "Live". If you do not want this page live, 301 redirects it to the new and live page.

If an external error is due to a misspelling, you can contact the provider to change it. Make sure that the pages are published and not in draft mode.

Access Denied

These errors occur when bots are unable to crawl a website page. This can occur due to the following reasons:

  • You have enabled the page authorisation, meaning a user cannot visit a page without providing login details.

  • Your website hosting service has blocked Google and other bots.

How to fix Access Denied errors?

These errors are less common but still harm your website ranking if the wrong URLs are blocked. Before fixing the errors, you must know what is causing them. 

Remove the login if you want certain pages to get crawled by Google.

See your robots.txt file for the URLs you want Google to crawl. Make sure you have kept them from crawling.

Use Google Search Console to see how it sees your website.

Not Followed

Not-followed errors occur when Google can only partially follow URLs to their destination. It may occur due to the following:

  • JavaScript, Flash, Cookies, Frames, Session IDs, and other forms of active content blocking the bots

  • Redirect chains and loops

  • Broken or improper redirects

How to fix it?

If "Not Followed" errors are on low-priority URLs, you may ignore them. But if they exist for high-priority links, you should immediately address them.

Google suggest using a text browser called Lynx to check your website. If your website contains JavaScript, Flash or any other active content which blocks you from viewing the website in this browser, bots will not be able to crawl and access your site.

Use the "Fetch and Render" tool of Google Search Console to check how Google sees your website.

To resolve the issues with parameter crawling, see how Google and other search engines handle it.

On the safer side, keep your parameter short and use them carefully.

In case of redirect errors, use the correct HTTP status codes and ensure they are pointing towards the correct pages.

Make sure to update the site architecture with static text links rather than solely relying on redirects.

Server Errors & DNS Errors

Like site errors, you will locate server and DNS errors in Google Search Console under their URL Error Report.

Here, Google cannot access a URL because of a server or DNS error of that particular link instead of the entire site.

How to fix it?

Fix your server connectivity problem if the Google bot cannot access your link.

Check your DNS setup if Google cannot connect with a URL due to a DNS timeout or lookup issue.

Some specific URL errors

Three main types of URL errors only apply to specific sites:

Mobile Specific Errors

If you have a non-responsive website, you will most likely face these errors. It can also occur due to faulty redirects, especially when you have a separate domain for mobile. To fix this issue, double-check your redirects and site structure.

Malware Errors

These errors occur when Google finds malicious activity or software on your web page.

These activities or software are used to get sensitive information or login credentials. To fix this, you need to remove this malicious malware.

Google News Errors

If your site is on Google News, you may encounter these errors. This happens due to some formatting issues like lack of proper headings or content structure or if Google thinks your article needs to fit their News section.

Make sure to check what exactly is causing these errors on your news site.

Conclusion

Crawl errors can occur for many reasons, like errors in CMS, hosting, domains, database, etc.

Some errors require immediate actions to fix, as they can affect the indexing and ranking of your website. 

If you want to avoid these errors, make sure to check your website errors regularly using Google Search Console and other diagnostic tools, for example, screaming frog, net peak spider, sitebulb, etc.

Related Articles

Leave a reply
All Replies (0)