Technical SEO

Crawl Directives

By Shahid Maqbool
On Apr 27, 2023

What are Crawl Directives?

Crawl directives are instructions you can give to search engines to control how they interact with your website.

These directives allow you to do several things:

Tell a search engine not to crawl a specific page on your website
Instruct a search engine not to use a page in its index after it has been crawled
Control whether search engines should follow or no follow links on a particular page
Set many other "minor" directives such as specifying how often a page should be crawled, whether search engines should ignore certain types of content, or how long search engines should wait before crawling a page again

Types of crawl directives

Crawl directives can be categorized into two types:

Robots meta directives (also known as meta tags)
Robots.txt file directives

Robots meta directives

Websites have special "meta tags" that give instructions to search engine crawlers. These tags are called "robots meta directives."

They tell the search bots how to handle and interact with the website's pages. The directives provide guidance on crawling and indexing pages.

Unlike the robots.txt file directives which just suggest things, these meta directives give more definite commands. So they have more power to control how search bots access website content.

There are two types of robots meta directives:

Meta robots tags

The meta robots tag also referred to as "meta robots" or "robots tag" is an HTML code websites use. It goes in the head section of a web page.

This tag gives instructions to search engine crawlers and tells them how to interact with the page.

It is commonly used to control the indexing and crawling behaviour of search engine bots, which can affect how the web page appears in SERPs.

Example:

Let's say you have a web page that contains sensitive information. It can be a login page or a page with confidential data that you don't want to be indexed.

You can use the meta robots tag to prevent search engine bots from indexing that page.

Here's an example of how the meta robots tag might be used in the HTML code of such a page:

<!DOCTYPE html>

<html>

<head>

    <meta name="robots" content="noindex, nofollow">

    <title>Login Page</title>

    <!-- other head elements go here -->

</head>

<body>

    <!-- login page content goes here -->

</body>

</html>

In this example, the meta robots tag is included in the <head> section of the web page with the content attribute set to "noindex, nofollow".

This tells search engine bots not to index the page and not to follow any links on the page.

X-robots tags

The X-Robots-Tag is part of the HTTP header response that is sent when a URL is requested, and it can be used to control indexing for an entire page or specific elements on that page.

Compared to using meta robots tags, which are relatively simple, the X-Robots-Tag is more complex.

However, it offers more flexibility and functionality.

There are certain situations where using the X-Robots-Tag is recommended. The two most common scenarios are:

when you want to control how non-HTML files (docs, pdfs, videos) are crawled and indexed
when you want to apply directives site-wide instead of on a page level

For example, if you want to block a specific image or video from being crawled, you can easily do so using the X-Robots-Tag header in the HTTP response.

Another advantage of using the X-Robots-Tag is that it allows you to combine multiple tags within an HTTP response or use a comma-separated list of directives to specify instructions to search engine bots.

For instance, if you don't want a certain page to be cached and want it to be unavailable after a certain date, you can use a combination of "noarchive" and "unavailable_after" tags in the X-Robots-Tag to convey these instructions to search engine bots.

Both types of directives can use the same parameters, such as "noindex" and "nofollow" to instruct crawlers.

The difference is in how these parameters are communicated to the crawlers, with meta robots directives being embedded within the HTML code, and x-robots-tag being sent as HTTP headers by the web server.

Robots.txt directives

Robots.txt directives are used to guide search engine robots or crawlers when they navigate a website and lead them to the correct pages for crawling.

Robots.txt directives are used to control which parts of a website search engine crawlers are allowed or disallowed to access and crawl.

These directives apply to entire sections or directories of a website, controlling crawler access at a broader level.

Unlike meta robots tags, robots.txt file - containing directives for crawlers - is placed in the root directory of a website.

The robots.txt file can block search bots from crawling certain pages. But it doesn't stop pages from being indexed if other sites link to them.

Also, not all search engine crawlers follow the robots.txt rules completely. Some may still index blocked pages.

To put it simply, the robots.txt file provides guidance to search engine bots on how to crawl your website.

It enables you to specify which sections of your site are open for crawling and which ones are not.

Note: If a web page is blocked from crawling through the robots.txt file, search engines won't be able to discover any indexing or serving rules specified in robots meta tags or X-Robots-Tag HTTP headers. So, if you want search engines to follow indexing or serving rules, you cannot block the URLs containing those rules from crawling in the robots.txt file.

An overview of the difference between different crawl directives

Feature	robots.txt	Meta Robots Tag	X-Robots-Tag
Purpose	To provide directives for web crawlers on how to interact with a website or specific pages for crawling	To control the indexing and following of links on a specific webpage	To control the indexing and following of links on a specific webpage or non-HTML files (like PDFs)
Location	In the website's root directory as a separate text file	In the HTML head section of each individual webpage	In the HTTP header of each individual webpage or file
Syntax	User-agent: Disallow: Allow:	<meta name="robots" content="directive1,directive2">	X-Robots-Tag: directive1, directive2
Control granularity	Website or directory level	Page level	Page level, and non-HTML files
Blocking web crawlers	Yes, by specifying user agents and disallowed paths	Yes, by using "noindex" and "nofollow" directives	Yes, by using "noindex" and "nofollow" directives
Crawl-delay directive	Yes, for some web crawlers	No	No
Supported by all major search engines	Yes	Yes	Yes
Influence on non-HTML files	No	No	Yes

Takeaway

Crawl directives are important for website optimization. They control how search engines access and scan sites.

Proper use of crawl directives ensures search engines index sites correctly. Owners can target important pages and block anything sensitive.

This optimization helps improve search visibility and rankings.

Crawl Errors

Articles

In Technical SEO

Mar 31, 2023

341 Views

URL Parameters

Articles

In Technical SEO

Apr 3, 2023

442 Views

Schema Markup Validator

Articles

In Technical SEO

Apr 3, 2023

841 Views

Crawl Directives

What are Crawl Directives?

Types of crawl directives

Robots meta directives

Meta robots tags

X-robots tags

Robots.txt directives

An overview of the difference between different crawl directives

Takeaway

Related Articles