Sitemap Analysis Module¶
Module ID: sitemap | Type: Conditional | Profiles: full, seo, content
The Sitemap Analysis module fetches and parses XML sitemaps, checking structure, validity, URL coverage, lastmod dates, duplicates, and robots.txt references.
What It Checks¶
| Check | What It Looks For |
|---|---|
| Sitemap existence | sitemap.xml accessible at the root |
| Robots.txt reference | Sitemap: directive in robots.txt |
| XML validity | Well-formed XML with correct namespace |
| URL count | Number of URLs in the sitemap |
| Lastmod dates | Whether lastmod is present and recent |
| Stale dates | URLs with lastmod older than 1 year |
| Duplicate URLs | Same URL appearing multiple times |
| Sitemap index | Support for sitemap index files |
| URL format | Consistent protocol (HTTP vs HTTPS) |
| Gzip compression | Whether sitemap supports compression |
Scoring Breakdown¶
| Criterion | Deduction | Condition |
|---|---|---|
| No sitemap found | -25 | No sitemap.xml at root |
| No robots.txt reference | -10 | Sitemap not referenced in robots.txt |
| Invalid XML | -15 | XML parse errors |
| No lastmod dates | -10 | URLs without lastmod |
| Many stale dates | -5 | More than 50% of lastmod dates older than 1 year |
| Duplicate URLs | -5 | Same URL listed multiple times |
| Mixed protocols | -5 | Mix of HTTP and HTTPS URLs |
Example Findings¶
P1 HIGH: No XML sitemap found
No sitemap.xml was found at the site root. Search engines rely on
sitemaps to discover and prioritise pages for crawling.
Fix: Generate an XML sitemap and place it at /sitemap.xml.
Most CMS platforms have sitemap plugins.
Effort: Low
P2 MEDIUM: Sitemap not referenced in robots.txt
The sitemap exists but robots.txt doesn't point to it. Add a
Sitemap directive to help search engines find it faster.
Fix: Add "Sitemap: https://example.com/sitemap.xml" to robots.txt.
Effort: Low
Notes¶
This module requires network access to fetch the sitemap and robots.txt. In offline mode, it checks only for sitemap references in the HTML (e.g. <link rel="sitemap">).