Skip to content

Sitemap Analysis Module

Module ID: sitemap | Type: Conditional | Profiles: full, seo, content

The Sitemap Analysis module fetches and parses XML sitemaps, checking structure, validity, URL coverage, lastmod dates, duplicates, and robots.txt references.


What It Checks

Check What It Looks For
Sitemap existence sitemap.xml accessible at the root
Robots.txt reference Sitemap: directive in robots.txt
XML validity Well-formed XML with correct namespace
URL count Number of URLs in the sitemap
Lastmod dates Whether lastmod is present and recent
Stale dates URLs with lastmod older than 1 year
Duplicate URLs Same URL appearing multiple times
Sitemap index Support for sitemap index files
URL format Consistent protocol (HTTP vs HTTPS)
Gzip compression Whether sitemap supports compression

Scoring Breakdown

Criterion Deduction Condition
No sitemap found -25 No sitemap.xml at root
No robots.txt reference -10 Sitemap not referenced in robots.txt
Invalid XML -15 XML parse errors
No lastmod dates -10 URLs without lastmod
Many stale dates -5 More than 50% of lastmod dates older than 1 year
Duplicate URLs -5 Same URL listed multiple times
Mixed protocols -5 Mix of HTTP and HTTPS URLs

Example Findings

P1 HIGH: No XML sitemap found
  No sitemap.xml was found at the site root. Search engines rely on
  sitemaps to discover and prioritise pages for crawling.
  Fix: Generate an XML sitemap and place it at /sitemap.xml.
       Most CMS platforms have sitemap plugins.
  Effort: Low

P2 MEDIUM: Sitemap not referenced in robots.txt
  The sitemap exists but robots.txt doesn't point to it. Add a
  Sitemap directive to help search engines find it faster.
  Fix: Add "Sitemap: https://example.com/sitemap.xml" to robots.txt.
  Effort: Low

Notes

This module requires network access to fetch the sitemap and robots.txt. In offline mode, it checks only for sitemap references in the HTML (e.g. <link rel="sitemap">).