Technical SEO

Sitemap Index

Shahid Maqbool

By Shahid Maqbool
On Jun 5, 2023

Sitemap Index

What is a Sitemap Index?

A sitemap index is a way to organize and provide access to multiple sitemap files, making it easier for search engines to discover and crawl all the pages of a big website.

A sitemap index is like a table of contents for all the individual sitemaps of a website. When a website has a large number of pages, a single sitemap file may not be enough to list all the URLs.

In such cases, the website owner can create multiple sitemap files and then include links to these individual sitemaps in a sitemap index file.

Explanation

A sitemap is like a map that lists all the pages on a website to help search engines find them easily. However, some websites have too many pages to fit into a single sitemap file because search engines limit the size of sitemaps.

When a website has a really large number of pages, it needs to split its sitemap into multiple smaller sitemaps, each containing a portion of the total pages. This way, each individual sitemap file stays within the size limit set by search engines.

Now, to tell the search engines about all these separate sitemap files, the website creates a "sitemap index" file. This index file simply contains a list of links pointing to all the individual sitemap files.

By submitting the sitemap index file to search engines, the website gives a master list of all different sitemaps to search engines for indexing and crawling its pages.

Example 

Here is an example of a sitemap index file in XML format that lists two sitemaps:

<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 

<sitemap> <loc>https://www.yourwebsite.com/sitemap1.xml</loc> </sitemap> 

<sitemap> <loc>https://www.yourwebsite.com/sitemap2.xml</loc> </sitemap> </sitemapindex> 

In this example:

<?xml version="1.0" encoding="UTF-8"?>

This line declares that the file is an XML document with version 1.0 and encoded using UTF-8.

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

This line defines the root element of the XML document, which is "sitemapindex." It also includes a namespace declaration, indicating that the file adheres to the XML schema defined by the "http://www.sitemaps.org/schemas/sitemap/0.9" namespace.

<sitemap>

  <loc>https://www.yourwebsite.com/sitemap1.xml</loc>

</sitemap>

This section represents the first entry in the sitemap index. It contains a "sitemap" element, which encapsulates the location ("loc") of the sitemap file, in this case, "https://www.yourwebsite.com/sitemap1.xml"

<sitemap>

  <loc>https://www.yourwebsite.com/sitemap2.xml</loc>

</sitemap>

Similarly, this section represents the second entry in the sitemap index. It contains another "sitemap" element with the location of the second sitemap file, "https://www.yourwebsite.com/sitemap2.xml.gz"

</sitemapindex>

This closing tag marks the end of the sitemap index file.

Why choose the sitemap index

If your website is small with only a few pages, you probably won't need a sitemap index. You can just list all the page URLs in a single sitemap file.

However, if your website is very large with thousands or millions of pages, a single sitemap file may not be enough. This is because search engines limit how big a sitemap file can be.

In this case, you'll need to split up your massive list of URLs into multiple smaller sitemap files, each one within the allowed size limit.

Using multiple Sitemaps can help with organizing URLs for websites. There are two main reasons for that:

Managing archived URLs separately

Websites often have some pages that don't change very often, like archived content or information that stays the same for a long time.

Other pages on the website get updated frequently with new content or information.

To make things easier to manage, you can create separate sitemap files for these two types of URLs.

Managing multiple sites in subfolders

If you have multiple websites organized within subfolders of the same domain, you can create a separate Sitemap for each website. Then, create a single Sitemap index file in the root directory that lists all the individual Sitemaps.

This approach works for websites hosted in subfolders, but not for websites on separate domains. For example, you can create a Sitemap index file at www.yourwebsite.com/sitemap_index.xml that lists Sitemaps for different subfolders like www.yourwebsite.com/site1/sitemap.xml and www.yourwebsite.com/site2/sitemap.xml.

But it won't work for separate domains like site1.yourwebsite.com or site2.yourwebsite.com.

Required tags for sitemap index

The tags for a sitemap index file follow the same rules as a regular sitemap file. They are defined by the namespace "http://www.sitemaps.org/schemas/sitemap/0.9".

This namespace defines how to create sitemap index files that search engines can understand.

To make sure Google can read your sitemap index file correctly, you need to include these required tags:

<sitemapindex>: This is the main parent tag that contains all other tags in the sitemap index file.

<sitemap>: This tag is used for each individual sitemap listed in the index file. It goes inside the <sitemapindex> tag as a direct child tag.

<loc>: This tag provides the URL of each individual sitemap file. It goes inside the <sitemap> tag as the only child tag. You can have a maximum of 50,000 <loc> tags in a sitemap index file, meaning you can list up to 50,000 different sitemap files.

There is one optional tag you can use to give Google more information:

<lastmod>: This tag shows the date and time that the specific sitemap file was last changed or updated. The date and time in the <lastmod> tag must follow a set format called the W3C Datetime format.

Including the required tags in your sitemap index file is necessary for Google to understand it properly.

Adding the optional <lastmod> tag with the last modification date can also be helpful. The <lastmod> tag allows Google to better plan when to re-crawl and re-index the content from your sitemaps.

Having your sitemaps properly crawled and indexed helps get your website's content included in Google's search results.

How to create a sitemap index

If your website is very large with many pages, your main sitemap index file can become too big (over 10MB). This can cause bandwidth issues when search engines try to download it.

To avoid this problem, you can create an additional level of organization by creating an index of sitemap index files.

This means:

  1. Create multiple smaller sitemap index files.

  2. Each small sitemap index file contains links to a group of individual sitemaps.

  3. Create one main sitemap index file that contains links to all the smaller sitemap index files.

By splitting it up this way, you keep the file sizes manageable. This prevents bandwidth issues when search engines download your sitemap files.

To create a sitemap index file, follow these steps:

  • First, have the URLs of all your individual sitemap files ready.

  • Open a text editor or XML editor to create a new XML file. Give it a name like "sitemap-index.xml" and save it with the .xml extension.

  • At the top of the file, add the XML declaration line:

<?xml version="1.0" encoding="UTF-8"?>
  • Next, declare the sitemap namespace within the root element:

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

This namespace declaration is required for the sitemap index to adhere to the appropriate schema.

  • For each individual sitemap file, add an entry within the <sitemap> element. Include the location of the sitemap file using the <loc> tag. Here is an example of how to structure an entry:

<sitemap>

  <loc>https://www.example.com/sitemap1.xml</loc>

</sitemap>

Repeat this structure for each sitemap file you want to include in the index.

  • Finally, close the sitemap index by adding the closing tag:

</sitemapindex>
  • Save the XML file with the appropriate name, such as "sitemap-index.xml".

Your sitemap index file is now created and ready to be submitted to search engines like Google.

How to compress the sitemap index

Compressing your sitemap index file can help reduce the amount of data that needs to be transferred when search engines download it from your server.

You can compress your sitemap index file using gzip compression. This is a common method that reduces the file size.

To compress the sitemap index file, you can follow these steps:

  • Make sure you have already created the sitemap index file in XML format before compressing it.

  • Before compressing the sitemap index file, you need a tool that can compress files. Most operating systems like Windows, macOS, or Linux come with built-in compression tools you can use. Or you can download third-party compression software.

  • Use the compression tool to compress the sitemap index file. The exact steps will depend on the compression tool you are using and your operating system. Here is an example using gzip:

gzip sitemap-index.xml

This command will compress the sitemap-index.xml file using gzip and create a compressed file named sitemap-index.xml.gz.

  • Once the compression is done, check that the compressed file was created successfully and confirm that its file size is now smaller than the original uncompressed file.

After completing the compression process, you now have a compressed version of your sitemap index file, such as sitemap-index.xml.gz using gzip compression. This compressed file can now be submitted to search engines like Google.

How to submit a sitemap index

To submit your sitemap index file to Google, sign into your Google Search Console account.

In the Search Console interface, find the section for submitting sitemaps and upload or submit your sitemap index file there, such as sitemap-index.xml or the compressed version sitemap-index.xml.gz if you compressed it.

You do not need to submit the individual sitemaps separately, as Google will use the URLs listed in the index file to crawl and find those sitemaps.

After processing the submitted index file, Google will notify you if there are any errors or issues found in the index file itself or any of the individual sitemaps URLs referenced in the index.

If you later update one of the individual sitemap files, you can modify the "lastmod" date for that sitemap's entry in the index file before re-submitting the updated index to Google Search Console.

Sitemap index best practices

Here are some best practices or key points to remember while creating and submitting sitemap index files:

Format

A sitemap index file uses the same XML format and follows the same rules as a regular sitemap file, as defined by the Sitemap Protocol standards.

This means all the formatting requirements that apply to sitemaps, such as proper XML structure, namespace declarations, and allowed tags, also apply to sitemap index files.

Hosting

The individual sitemap files listed in your sitemap index file should be hosted on the same website domain as the sitemap index file itself.

This makes it easy for search engines to access and crawl all the referenced sitemap files.

The only exception is if you have specifically set up cross-site submission capabilities, which allows sitemaps from other domains to be included in your index file.

FYI: Cross-site submission refers to the practice of hosting sitemaps referenced in a sitemap index file on different websites (domains) than the one hosting the sitemap index file itself. 

Directory location

All the individual sitemap files listed in your sitemap index file must be located in the same directory as the index file itself, or in a subdirectory below that directory within your website's folder structure. For example, if the sitemap index file is located at:

https://yourwebsite.com/public/sitemap_index.xml

The referenced sitemaps can only be in the same directory or in a subdirectory, like:

https://yourwebsite.com/public/shared/....

Submission limit

Google Search Console allows you to submit up to 500 sitemap index files for each website you have verified in your account.

This generous limit gives you the ability to comprehensively provide information about the structure and content across your entire website to search engines through sitemap index files.

Following best practices for properly creating and submitting these sitemap indexes helps ensure search engines like Google can efficiently crawl and index all the pages of your site listed in the sitemaps.

This can improve your site's visibility and ranking in search results by making sure no pages are missed during crawling.

Takeaway

A sitemap index file is an XML file that lists the URLs of all the individual sitemap files for a website.

It is used to manage larger websites that have too many pages to fit into a single sitemap file.

The sitemap index allows you to split up the full sitemap into multiple smaller sitemap files, with the index providing the links to each one.

Search engines like Google can read the sitemap index and use it to quickly and efficiently discover and crawl through each of the individual sitemap files listed in the index.

This ensures the search engine can comprehensively index all the content across the entire website by following the sitemap files organized in the index.

Related Articles

Leave a reply
All Replies (0)