Technical SEO

XML Sitemaps

Shahid Maqbool

By Shahid Maqbool
On Apr 26, 2023

XML Sitemaps

What is an XML Sitemap?

XML (Extensible Markup Language) sitemap is a file that contains a list of all the pages on a website, along with information about each page, such as when it was last updated and how important it is relative to other pages on the site. 

XML sitemaps are a list of a website's important web pages. They act as a table of contents of your website and list the names of the web pages and their type, along with other essential details.

Having an XML sitemap can improve a website's visibility and search engine ranking by providing search engine crawlers with a well-defined website structure.

You can use XML sitemaps to index and structure content on your website. The crawler can quickly get to all pages of your website with helpful content on them.

Why is XML Sitemap Required?

XML sitemaps provide search engine crawlers with a roadmap of all the pages on your site.

This helps them to discover and crawl all the pages more efficiently, which can lead to better indexing in search results.

XML sitemap can help to improve your website's visibility in search results. This can lead to more traffic and potentially higher rankings for your pages.

XML Sitemaps help Google and other search engines find your website's helpful pages.

Sometimes, webpages are not internally linked adequately, so they are hard to find, but XML sitemaps allow crawlers to access those pages as well.

What Kind of Websites Need XML Sitemaps?

The addition of a sitemap solely does not affect search rankings. They are particularly important for websites which have lots of web pages.

Some websites update their pages regularly while others keep adding new ones – in both cases, this leads to an extensive collection of pages.

Thus, some web pages get lost in the crowd. That is why XML site mapping can help get crawlers to access them.

Websites with complex navigation structures can make it difficult for search engine crawlers to discover all the pages on the site.

An XML sitemap can provide a clear list of all the pages, regardless of their location in the navigation structure.

Additionally, If a website has pages that are not properly linked internally, such as landing pages for specific marketing campaigns, an XML sitemap can help ensure that those pages get crawled and indexed by search engines.

How Do XML Sitemaps Work?

When search engine crawlers visit the site, they first look for the XML sitemap. They then read the sitemap to discover all the pages on the site, along with additional information about each page.

XML Sitemaps work by helping the search engine navigate through your website.

Thus, just like a traveller follows a map, an XML sitemap works by giving Google and other search engines a map to access and crawl pages on your website. 

Example of XML Sitemap

Here is an example of an XML sitemap of the website shahidmaqbool.com.

When you click on this link, you will see a date at the end of each URL. It indicates when the content was last updated. It tells Google that new content is available to crawl and index.

Four rows indicate a separate sitemap. Each sitemap will take you to another list of URLs currently indexed under this particular sitemap.

Example of XML Sitemap

An XML sitemap has a limitation of 50,000 URLs and a file size limit of 50MB (uncompressed). 

If you have a big website with many pages that exceed the limit, you need to break the sitemap into multiple sitemaps.

To do this, you can create a sitemap index file, which is a parent file containing a list of the smaller sitemap files.

In Google Search Console, you have the ability to create up to 500 sitemap index files for each website, as well as additional sitemap files as needed to include all of your website's URLs.

Once you have created and configured these sitemap files, you can submit them to Google to ensure that search engines crawl and index all of the pages on your website.

This process can help improve your website's visibility in search results and ensure that users are able to find all of the content on your site.

How Do You Read XML Sitemaps?

All website owners must know how to read and interpret the various components of a sitemap.

There are several components of an XML sitemap. While not all XML sitemaps display the same information, some of the information is common among all.

Let us take a sample XML sitemap of the website seodebate.com and read its different components.

Some of the components/tags that you will see on all sitemaps are:

  • <urlset>

  • <loc>

  • <lastmod>

  • <changefreq>

  • <priority>

  • <url>

  • Hreflang tag

<urlset>

This tag contains a set of all URLs on a website. It also states which version of the XML sitemap standard is used. The 0.9 standard is used in the below example, supported by major search engines like Google, Yahoo, and Bing.

<urlset> XML sitemaps

<url>

The <url> tag is a required element in an XML sitemap and is used to list the URL and additional metadata for a page on a website.

The <url> tag should enclose all the child elements that provide information about a specific page, including the <loc> tag that specifies the URL of the page.

Here's an example of how the <url> tag might be used in an XML sitemap:

<url>

 <loc>https://seodebate.com/page1</loc>

 <lastmod>2023-02-01</lastmod>

 <changefreq>weekly</changefreq>

 <priority>0.8</priority>

</url>

In this example, the <url> tag lists the URL and additional metadata for a page on the "seodebate.com" website called "page1".

The child elements of the <url> tag provide information about when the page was last modified (<lastmod>), how frequently the page changes (<changefreq>), and the priority of the page relative to other pages on the site (<priority>).

It's important to note that each <url> tag should only list the URL and metadata for a single page on the website.

If a website has multiple pages, each page should have its own <url> tag within the larger <urlset> tag that encloses all the URLs on the site.

<loc>

The <loc> tag is a required child element of the <url> tag in an XML sitemap and is used to specify the URL of a page on a website.

The <loc> tag should contain the full URL of the page, including the protocol (http or https) and the domain name.

Here's an example of how the <loc> tag might be used in an XML sitemap:

<url>

 <loc>https://seodebate.com/page1</loc>

 <lastmod>2023-02-01</lastmod>

 <changefreq>weekly</changefreq>

 <priority>0.8</priority>

</url>

In this example, the <loc> tag specifies the URL of a page on the "seodebate.com" website called "page1".

When search engines read this XML sitemap, they will use the URL specified in the <loc> tag to crawl and index the page.

It is important to note that the <loc> tag should only contain one URL per tag.

If a page has multiple URLs, each URL should have its own <url> tag with its own <loc> tag specifying the unique URL.

<lastmod>

The <lastmod> tag is an optional child element of the <url> tag in an XML sitemap and is used to specify the date when a page on a website was last modified. The date should be in ISO 8601 format, which is YYYY-MM-DD.

Here's an example of how the <lastmod> tag might be used in an XML sitemap:

<url>

 <loc>https://seodebate.com/page1</loc>

 <lastmod>2023-02-01</lastmod>

 <changefreq>weekly</changefreq>

 <priority>0.8</priority>

</url>

In this example, the <lastmod> tag specifies that the "page1" URL was last modified on February 1st, 2023.

This information can be used by search engines to determine how frequently to crawl the page and to ensure that the most up-to-date version of the page is displayed in search results.

It's important to note that the <lastmod> tag should only be used if the page has been modified since the last time it was crawled by a search engine.

If the page has not been modified, the <lastmod> tag should be omitted.

As Google states:

Google states about lastmod value

It means that Google may use the <lastmod> tag to determine the last time a page was modified, but only if the date provided in the sitemap is consistent with the actual last modification date of the page.

If the date provided in the sitemap is not accurate, Google will rely on other signals to determine the last modification time of the page.

<priority>

This optional tag tells Google or other search engines the priority of your web pages or a URL.

Simply put, it tells the search engines which pages they should prioritise while allocating the crawl budget.

Its value ranges from 0.0 to 1.0, where 0.0 is the lowest priority while 1.0 is the highest.

However,  according to Gary Illyes (Chief of Sunshine and Happiness at Google), setting a priority level won’t give you an advantage.

Gary Illyes tweet about priority settings

<changefreq>

The <changefreq> tag is an optional child element of the <url> tag in an XML sitemap which indicates how often content is changed on a website. However, Google says it is obsolete. 

changefreq value for Google

<Hreflang>

The <hreflang> tag is an optional child element of the <url> tag in an XML sitemap and is used to indicate the language and regional targeting of a page on a website.

This tag is used for international SEO purposes to help search engines serve the appropriate version of a page to users based on their location and language preferences.

Here is an example of how the <hreflang> tag might be used in an XML sitemap:

<url>

 <loc>https://seodebate.com/page1</loc>

 <lastmod>2023-02-01</lastmod>

 <changefreq>weekly</changefreq>

 <priority>0.8</priority>

 <xhtml:link rel="alternate" hreflang="en-us" href="https://seodebate.com/en/page1"/>

 <xhtml:link rel="alternate" hreflang="es-mx" href="https://seodebate.com/es/page1"/>

</url>

In this example, the <hreflang> tag specifies two alternate versions of the "page1" URL, one for English-speaking users in the United States (<xhtml:link rel="alternate" hreflang="en-us" href="https://seodebate.com/en/page1"/>) and one for Spanish-speaking users in Mexico (<xhtml:link rel="alternate" hreflang="es-mx" href="https://seodebate.com/es/page1"/>).

This allows search engines to serve the appropriate version of the page to users based on their location and language preferences.

It's important to note that the <hreflang> tag should only be used for pages that have alternate versions in different languages or regions. If a page has only one version, the <hreflang> tag should be omitted.

Additionally, the <hreflang> tag should always be used in conjunction with the rel="alternate" attribute to indicate that the linked URLs are alternate versions of the same page.

Google Priorities on Sitemaps

Here are some key factors that Google prioritizes and utilizes when it comes to XML sitemaps.

  • Sitemap size must be limited to 50MB (uncompressed) or 50,000 URLs. If larger, create multiple sitemaps or a sitemap index file to track search performance in Search Console.

  • The sitemap must be UTF-8 encoded and hosted anywhere on your site, but for maximum impact, submit it through Search Console or place it at the site root.

  • All tag values in XML sitemaps must be entity escaped.

  • <priority> and <changefreq> values are ignored by Google, and the <lastmod> value is used if it is verifiably accurate, such as by comparing it to the last modification of the page.

How to Create an XML Sitemap?

You can use various online generators or plugins to create an XML sitemap for your website.

Here's a detailed guide on how to create an XML sitemap:

Determine the pages on your website

The first step in creating an XML sitemap is to determine which pages on your website you want to include.

This typically includes pages like your homepage, blog posts, product pages, and other important content.

Choose a tool or method for creating the sitemap

There are several ways to create an XML sitemap, including using online generators, WordPress plugins, or manually creating the file using a text editor. Here are some options:

Online sitemap generators

There are many free online tools available that can generate an XML sitemap for your website. Let's take the example of “XML Sitemaps generator”.

  • Go to the “XML Sitemaps” generator in your web browser.

  • Enter the URL of your website in the provided field. This is the web address of your website for which you want to generate a sitemap.

  • Choose the optional settings for your sitemap, such as the maximum number of pages to include, change frequency, and priority. You can adjust these settings based on the specific needs of your website.

XML sitemaps generator settings
  • Click on the "Start" button to start the sitemap generation process. The tool will then crawl your website and generate a sitemap XML file.

  • Once the sitemap is generated, you can download it by clicking on the "Download your XML Sitemap" button. This will download the sitemap file to your computer.

  • Upload the downloaded sitemap XML file to the root directory of your website using FTP or any other file transfer method.

  • Finally, submit the sitemap to search engines, such as Google, Bing, etc., to notify them of the existence and structure of your sitemap, which can help improve your website's visibility in search results.

Using a crawler

You can also generate a sitemap using a crawler.

Let's take the example of Screaming Frog to generate an XML sitemap:

  1. Download and install Screaming Frog SEO Spider on your computer.

  2. Open the tool and enter the URL of your website in the search bar.

  3. Wait for the tool to crawl your website and gather information about your pages.

  4. Once the crawl is complete, click on the "Sitemaps" tab.

  5. Select "XML Sitemap" from the dropdown menu.

  6. Click on the "Create Sitemap" button.

  7. The tool will generate an XML sitemap for your website. You can view it by clicking on the "View Sitemap" button.

  8. Save the sitemap file to your computer by clicking on the "Export" button.

  9. Upload the sitemap file to your website's root directory, typically named "sitemap.xml".

Note: When generating a sitemap, you can choose any name for the sitemap file, but it must have the .xml format. This means that the file extension should be .xml, for example, "sitemap.xml" or "mywebsite.xml".

WordPress plugins

If your website is built on WordPress, there are several plugins available that can generate an XML sitemap automatically. Some popular options include Yoast SEO and All in One SEO Pack.

Let's take the example of Yoast SEO to generate an XML sitemap:

  1. Install and activate Yoast SEO Plugin on your WordPress website.

  2. Go to the Yoast SEO settings by clicking on "SEO" in the WordPress dashboard menu, then select "General".

  3. Click on the "Features" tab and scroll down to the "XML sitemaps" option. Make sure the toggle switch is set to "On".

  4. Click on the "XML Sitemaps" tab in the Yoast SEO settings.

  5. Configure the settings for your XML sitemap, including which post types and taxonomies should be included, whether to include images or videos and more.

  6. View the XML sitemap by clicking on the "See the XML sitemap" button.

  7. Copy the sitemap URL and submit it to search engines like Google and Bing through their webmaster tools.

  8. Test the sitemap using a tool like Google Search Console to ensure that there are no errors or issues.

Manual creation

If you prefer to create the XML sitemap manually, you can use a text editor like Notepad or Sublime Text to create the file.

You will need to follow specific formatting rules and include all the necessary elements.

Creating an XML sitemap manually requires some knowledge of XML and the correct formatting for the sitemap.

Here are the steps to create an XML sitemap manually:

  1. Open a plain text editor like Notepad.

  2. Begin the sitemap by typing <?xml version="1.0" encoding="UTF-8"?> on the first line. This specifies that the file is an XML file and sets the encoding to UTF-8.

  3. Add the <urlset> element, which will enclose all the <url> elements for each page on your site. The <urlset> element should include the xmlns attribute to specify the XML namespace. For example: <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">.

  4. For each page on your website that you want to include in the sitemap, add <url> element. Each <url> element should include <loc> element with the URL of the page. You can also add optional child elements like <lastmod>, <changefreq>, and <priority>.

  5. Close the <urlset> element by adding </urlset> on the last line of the file.

  6. Save the file with a .xml extension, such as sitemap.xml. You can change the name of the file - like file.xml or name.xml - but the extension must be .xml

  7. Upload the sitemap to the root directory of your website using FTP or through your website's control panel.

Creating an XML sitemap manually can be time-consuming if you have many pages on your website.

However, it gives you complete control over the content and structure of the sitemap.

Submitting Sitemap

After testing the sitemap, you should submit it to search engines like Google and Bing to ensure that they are aware of all the pages on your website.

This can be done through the search engine's webmaster tools or by adding the sitemap URL to your robots.txt file.

Once submitted, search engines will crawl and index the pages listed in the XML sitemap.

Submitting Through Google Console

To submit a sitemap in Google Search Console, follow the steps:

  1. Go to the Google Search Console website and log in with your Google account credentials.

  2. If you have multiple websites associated with your account, select the one for which you want to submit the sitemap.

  3. In the left-hand menu, click on "Sitemaps" under the "Index" section.

  4. Click on the "Add/Test Sitemap" button in the top right corner of the page. Enter the URL of your sitemap and click "Submit".

  5. Google will now verify the sitemap and check for any errors or issues. This may take a few minutes or longer, depending on the size of your sitemap.

  6. Once the verification is complete, you can view the status of your sitemap in the "Submitted" section of the Sitemaps report. If there are any errors or warnings, Google will provide details on how to fix them.

Submitting through Bing Webmaster

  1. Sign in to your Bing Webmaster account.

  2. Navigate to your website's dashboard.

  3. Click on the "Sitemaps" option on the left-hand menu.

  4. Click on the "Add Sitemap" button.

  5. Enter the URL of your sitemap in the "Submit a sitemap" field.

  6. Click on the "Submit" button.

  7. Bing Webmaster will now begin crawling your sitemap.

  8. Check the "Status" column to see if your sitemap has been successfully submitted and crawled.

  9. If there are any errors or warnings, click on the sitemap URL to see more details and address any issues.

  10. Once your sitemap has been successfully submitted and crawled, you can monitor its status and view any indexing or crawling issues on the "Sitemaps" page.

Adding Your XML Sitemap to the Robots.Txt File

You can also submit your Sitemap to Google by adding the Sitemap to your robots.txt file. This is a common way to submit XML Sitemaps.

Adding your XML sitemap to the robots.txt file is another way to inform search engines about the location of your sitemap. Here are the steps to add your XML sitemap to the robots.txt file:

  1. To access the robots.txt file, you need to log in to your website's server using FTP or a file manager in your web hosting control panel.

  2. Locate the robots.txt file which is usually located in the root directory of your website. If the file doesn't exist, you can create it using a plain text editor.

  3. In the robots.txt file, add a line that specifies the location of your XML sitemap. For example, if your sitemap is located at https://example.com/sitemap.xml, add this line to the robots.txt file: Sitemap: https://example.com/sitemap.xml

  4. Save the robots.txt file and upload it to the root directory of your website using FTP or a file manager.

  5. Once you have added the sitemap location to the robots.txt file, verify that it is working correctly by accessing the sitemap URL in your web browser.

Adding your XML sitemap to the robots.txt file can help search engines find and crawl all the pages on your website more efficiently.

How to Cross-submit Sitemaps For Multiple Sites?

According to Google, you have two options to submit sitemaps for multiple sites:

Single Sitemap for Multiple Websites: You can create a single sitemap that includes URLs from multiple websites, even if they have different domains.

You can then submit this sitemap or sitemap index file to Google Search Console for indexing.

Make sure you have verified ownership of all the sites included in the sitemap in Google Search Console.

Individual Sitemaps in a Single Location: Alternatively, you can create individual sitemaps for each site and store them in a single location, such as a subdomain or a subdirectory.

You can then reference each individual sitemap in the respective site's robots.txt file.

FAQs

How many URLs are in a sitemap?

XML Sitemaps can have up to 50,000 URLs per file, with a file size limit of 50MB uncompressed, but it's better to keep them smaller (up to 5,000 URLs) and focused on specific sections of your website for better crawling and indexing.

Can you have two sitemaps?

Yes, you can have two or more sitemaps for a single website. In fact, it's a common practice to split sitemaps into multiple files if the website has a large number of URLs, or if the URLs can be categorized into separate groups such as product pages, blog posts, or images. For that, you need an XML sitemap index file. 

What is an XML Sitemap Index file?

An XML Sitemap Index file is a special file that acts as a directory for multiple XML sitemap files on a website.

It contains links to individual sitemap files, which help search engines find and crawl all the pages on a website.

Each XML sitemap has a limit of 50,000 URLs, so if this limit is exceeded, separate XML sitemap files need to be created.

These sitemap files can be grouped together in an XML Sitemap Index file, which can contain multiple XML sitemap files.

One domain can have up to 500 XML Sitemap Index files.

How often should you update your sitemap?

You only need to submit your Sitemap once to the search engines. After that, they update automatically when you delete, add, or edit any content on your website. You do not need to submit sitemaps multiple times.

How often does Google read sitemaps?

Google does not provide a specific timeframe for how often it reads sitemaps. The frequency of Google crawling and indexing sitemaps depends on various factors. Typically, Google will crawl and index a sitemap within a few days of submission.

Which pages should you add to your XML sitemap?

All the important pages (homepage, about us, product page, contact us, blogs) must be included in your sitemaps.

On the other hand, you must avoid adding thin, weak, or low-quality pages (Thankyou pages, tag pages) to your sitemaps.

How to locate the sitemap of your website?

If you have created the sitemap of your website manually or by using a tool, you can locate it by using the following URL:

yourdomainname.com/sitemap.xml

Similarly, if you have created the sitemap of your WordPress website by using Yoast, you can locate it by using the following URL:

yourdomainname.com/sitemap_index.xml

Note: sitemap_index.xml shows a sitemap index file containing multiple sitemaps.

Related Articles

Leave a reply
All Replies (0)