What is a Sitemap Index?
A sitemap index is a file that includes a collection of links to sitemaps for a website. Its main purpose is to surpass the size limitations of a single sitemap file.
Explanation
A sitemap is a file that contains a list of all the pages of a website, which helps search engines like Google understand the structure and content of the site.
However, size limits are imposed on sitemaps by search engines, such as Google.
If your sitemap is too large and exceeds the size limits set by search engines, you will need to split it up into smaller sitemaps. Each new sitemap should be below the size limit specified by the search engine.
This can usually be done by dividing your website's pages into logical sections or categories and creating a separate sitemap for each section or category.
Once you have split up your large sitemap into smaller sitemaps, you can create a sitemap index file. This file can be submitted to search engines which serve as a way to submit multiple sitemaps at once.
Search engines will then use the sitemap index file to crawl and index the pages of your website more efficiently.
Example
Here is an example of a sitemap index file in XML format that lists two sitemaps:
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap> <loc>https://www.yourwebsite.com/sitemap1.xml</loc> </sitemap>
<sitemap> <loc>https://www.yourwebsite.com/sitemap2.xml</loc> </sitemap> </sitemapindex>
In this example:
<?xml version="1.0" encoding="UTF-8"?>
This line declares that the file is an XML document with version 1.0 and encoded using UTF-8.
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
This line defines the root element of the XML document, which is "sitemapindex." It also includes a namespace declaration, indicating that the file adheres to the XML schema defined by the "http://www.sitemaps.org/schemas/sitemap/0.9" namespace.
<sitemap>
<loc>https://www.yourwebsite.com/sitemap1.xml</loc>
</sitemap>
This section represents the first entry in the sitemap index. It contains a "sitemap" element, which encapsulates the location ("loc") of the sitemap file, in this case, "https://www.yourwebsite.com/sitemap1.xml"
<sitemap>
<loc>https://www.yourwebsite.com/sitemap2.xml</loc>
</sitemap>
Similarly, this section represents the second entry in the sitemap index. It contains another "sitemap" element with the location of the second sitemap file, "https://www.yourwebsite.com/sitemap2.xml.gz"
</sitemapindex>
This closing tag marks the end of the sitemap index file.
Why choose the sitemap index
If your website is small and has a relatively low number of URLs, you typically won't require a Sitemap index file. In such cases, you can simply list all your URLs within a single Sitemap file.
However, if your website is larger and contains a significant number of URLs, you might find it beneficial or necessary to have multiple Sitemaps. This is where a Sitemap index file comes into play.
Using multiple Sitemaps can help with organizing URLs for websites. There are two main reasons for that:
Managing archived URLs separately
If you have URLs that are archived or don't change frequently, you can store them in one Sitemap. Similarly, if you have URLs that change frequently, you can store them in another Sitemap.
This way, when you need to add new URLs to the Sitemap, you can work with smaller and more manageable files.
Managing multiple sites in subfolders
If you have multiple websites that are organized in subfolders, you can create a Sitemap for each site and then create a Sitemap index file in the root directory that lists them.
However, this method only works for subfolders, not for separate domains. For example, you can create a Sitemap index file at www.yourwebsite.com/sitemap_index.xml that lists Sitemaps for different subfolders like www.yourwebsite.com/site1/sitemap.xml and www.yourwebsite.com/site2/sitemap.xml.
But it won't work for separate domains like site1.yourwebsite.com or site2.yourwebsite.com.
Required tags for sitemap index
The sitemap index tags - which are used to define a sitemap index file - are defined by the same namespace as traditional sitemaps, which is "http://www.sitemaps.org/schemas/sitemap/0.9".
This namespace provides the structure and rules for creating sitemap index files that can be understood by search engines.
To ensure that Google can properly use your sitemap index file, you must include the following required tags:
<sitemapindex>: This is the parent tag of the XML tree and contains all the other tags in the sitemap index file.
<sitemap>: This is the parent tag for each sitemap listed in the file. It is the first and only direct child tag of the <sitemapindex> tag.
<loc>: This tag specifies the location (URL) of the sitemap. It is the first and only child of the <sitemap> tag. A sitemap index file can list up to 50,000 <loc> tags, which means you can include up to 50,000 individual sitemaps in a sitemap index file.
In addition to the required tags, there are optional tags that you can use to provide additional information to Google:
<lastmod>: This tag indicates the time that the corresponding sitemap file was last modified. The value for the <lastmod> tag must be in W3C Datetime format, which is a standardized format for representing date and time information.
Including the required tags and optionally providing additional information like the last modification date using the <lastmod> tag can help Google better schedule the crawling of your sitemaps and improve the indexing process of your website.
How to create a sitemap index
If you have a large website with a lot of pages, the size of your Sitemap index file may become too big, reaching the maximum limit of 10MB.
This can cause issues with bandwidth when search engines download the file. To overcome this limitation, you can create an additional level of organization by creating an index of Sitemap index files.
This means you can create multiple smaller Sitemap index files, each containing links to a group of individual Sitemaps, and then create a main Sitemap index file that contains links to these smaller Sitemap index files.
This way, you can keep the file sizes of your Sitemap index files manageable and avoid overwhelming your bandwidth when search engines download them.
To create a sitemap index, you can follow these steps:
Before creating a sitemap index, you need to have the URLs of the individual sitemap files ready.
Use a text editor or an XML editor to create a new XML file. You can name it "sitemap-index.xml" or choose any other appropriate name but do not forget to save it with extension .xml.
At the beginning of the XML file, add the XML declaration line:
<?xml version="1.0" encoding="UTF-8"?>
Next, declare the sitemap namespace within the root element:
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
This namespace declaration is required for the sitemap index to adhere to the appropriate schema.
For each individual sitemap file, add an entry within the <sitemap> element. Include the location of the sitemap file using the <loc> tag. Here is an example of how to structure an entry:
<sitemap>
<loc>https://www.example.com/sitemap1.xml</loc>
</sitemap>
Repeat this structure for each sitemap file you want to include in the index.
Finally, close the sitemap index by adding the closing tag:
</sitemapindex>
Save the XML file with the appropriate name, such as "sitemap-index.xml".
Your sitemap index file is now created and ready to be submitted to search engines like Google.
How to compress the sitemap index
Compressing your Sitemap index file can help you save bandwidth - which is the amount of data transferred from your server to users' devices when they access your website.
You can compress your Sitemap index file using gzip, which is a common compression method used for reducing the file size.
To compress the sitemap index file, you can follow these steps:
Before compressing the file, ensure that you have already created the sitemap index file in XML format.
To compress the sitemap index file, you need a compression tool that supports the chosen compression method. Many operating systems come with built-in compression tools, such as gzip for Linux or macOS, or you can use third-party compression software.
Use the compression tool to compress the sitemap index file. The specific command or process may vary depending on the tool and operating system you are using. Here is an example using gzip:
gzip sitemap-index.xml
This command will compress the sitemap-index.xml file using gzip and create a compressed file named sitemap-index.xml.gz.
After the compression process is complete, you can verify the compressed file's existence and check its reduced file size.
Now, you have successfully compressed the sitemap index file using the chosen compression method. The compressed file, such as sitemap-index.xml.gz, can be used and submitted to search engines.
How to submit a sitemap index
To submit your Sitemap index file, you can sign into your Google Search Console account and upload the Sitemap index file. You do not need to submit the individual sitemaps that are included in the index separately.
Once Google has processed your Sitemap index file, they will notify you if there are any errors found in the index file itself or in any of the individual sitemaps listed in it.
If you make changes to one of the individual sitemaps included in your Sitemap index file, you can update the "lastmod" date for that particular sitemap in your index.
Sitemap index best practices
Here are some best practices or key points to remember while creating and submitting sitemap index files:
Format
The XML format of a sitemap index file follows the same format as a regular sitemap file and is defined by the Sitemap Protocol.
This means that the same requirements that apply to sitemaps also apply to sitemap index files.
Hosting
The sitemaps referenced in the sitemap index file must be hosted on the same website (same domain) as the sitemap index file unless you have set up a cross-site submission.
This ensures that search engines can easily access and crawl the sitemaps.
FYI: Cross-site submission refers to the practice of hosting sitemaps referenced in a sitemap index file on different websites (domains) than the one hosting the sitemap index file itself.
Directory location
The sitemaps referenced in the sitemap index file must be located in the same directory as the sitemap index file or in a lower directory in the site's hierarchy. For example, if the sitemap index file is located at:
https://yourwebsite.com/public/sitemap_index.xml
The referenced sitemaps can only be in the same directory or in a subdirectory, like:
https://yourwebsite.com/public/shared/....
Submission limit
You can submit up to 500 sitemap index files for each site in your Search Console account. This allows you to provide comprehensive information about the structure and content of your website to search engines.
Following these best practices for creating and submitting sitemap index files can help ensure that search engines can effectively crawl and index your website, which may improve its visibility in search results.
Takeaway
A sitemap index file is an XML document that contains a list of links to individual sitemap files for a website.
It is used to overcome size limitations and manage larger websites. Search engines like Google can read the index and crawl each individual sitemap file listed within it efficiently - ensuring comprehensive indexing of the website's content.