Your Sitemap Is Not Getting You Indexed — Here's What Actually Does
Here's the call every SEO forum gets at least three times a week: "I submitted my sitemap to Google Search Console two weeks ago and nothing is indexed. What's wrong?"
The honest answer: probably nothing is wrong with your sitemap. A sitemap doesn't get you indexed. It tells Google where to look. Whether Google then chooses to actually visit, crawl, and index those pages depends on a completely separate set of signals — most of which have nothing to do with the sitemap file itself.
This matters because every week people waste time obsessing over sitemap formatting, priority values, and changefreq settings when the real indexing blockers are sitting somewhere else entirely.
What a Sitemap Actually Does (and Doesn't Do)
An XML sitemap is a hint file. You're telling Google: "These URLs exist on my site. Here's how important I think they are relative to each other, and here's how often the content changes."
That's it. Google can accept or reject the hint entirely. Submitting a sitemap doesn't guarantee:
- That your pages will be crawled
- That crawled pages will be indexed
- That indexed pages will rank
- That crawling will happen faster than without a sitemap
What a sitemap does genuinely help with:
- Discovery on new sites — A brand new site with zero inbound links has no way for Googlebot to find its pages through normal link-following. A sitemap is the only path.
- Large sites with weak internal linking — If your site has 50,000 pages but some are three or four clicks deep from the homepage, Googlebot might not get there on its own crawl. A sitemap surfaces those URLs.
- Signaling content changes — Updating the
lastmoddate on changed pages can nudge Googlebot to revisit them sooner. (Though Google says it uses its own signals primarily.) - Orphaned pages — Any page with no inbound links from your own site is invisible to Googlebot without a sitemap.
The mental model that helps: Think of a sitemap like a restaurant menu slid under the door. The restaurant knows the dishes are available. Whether they send a waiter to actually pick it up and visit — that depends on whether they trust the establishment.
Crawl Budget: The Thing Most People Ignore
Googlebot allocates a crawl budget to every site — a rough limit on how many pages it will crawl from your domain within a given period. For large sites, this matters enormously. For small sites (under a few thousand pages), it's rarely the bottleneck.
What wastes crawl budget:
- Redirect chains (page A → B → C → D)
- Infinite URL parameters (
?sort=price&page=2&color=blue&session=abc123) - Duplicate content without canonical tags
- Low-value paginated pages (page 47 of a category listing)
- Soft 404s — pages that return a 200 status but say "no results found"
- Pages in your sitemap that are blocked by robots.txt (contradictory signals)
If your sitemap includes 2,000 URLs but 600 of them are duplicate, low-value, or blocked — Googlebot wastes budget visiting them, which takes time away from your important pages.
Adding every URL your CMS generates to the sitemap — including /tag/whatever, /category/xyz, paginated URLs like ?page=2, and filtered product pages. Only include URLs you genuinely want indexed and that have distinct, non-duplicate content.
What Actually Gets Pages Indexed
Google decides whether to index a page based on a layered set of trust and quality signals. Here's how they stack up:
The pattern is consistent: Google indexes pages it trusts and finds valuable, not just pages you tell it exist. A sitemap is a discovery tool, not a trust-building tool.
New Sites: Why the Wait Is Normal
If your site launched in the last two months and you're waiting on indexing, a lot of what you're experiencing is normal and expected.
Google's crawl priority queue for new domains is conservative. A brand new domain has no track record. Googlebot will initially crawl only a handful of pages, assess the quality of what it finds, and then decide whether to invest more crawl budget. This assessment period can take weeks to months.
Things that help move this along:
- Get a real inbound link. A link from an already-indexed site — even a small one — tells Googlebot your site is real and connected to the web. This is the single most effective accelerant for new site indexing.
- Build internal links from the start. Every page should be reachable by following links from the homepage in three clicks or fewer.
- Have substantive content on day one. Launching a site with placeholder or thin content gets you deprioritized immediately. Googlebot's initial sample of your site will inform how aggressively it crawls going forward.
- Verify in Search Console and submit the sitemap there. This gives Google an explicit signal that you're the verified owner and you're actively managing the site.
Indexed ≠ Ranking
A common second-stage confusion: pages get indexed but don't show up in search results for the queries you want.
Indexing just means the page exists in Google's database. Whether it shows up for a specific query depends on:
- Relevance — does the page content actually address what the searcher is looking for?
- Competition — how many other pages are competing for the same query?
- Authority — how much does Google trust your domain and this specific page?
- UX signals — click-through rate, dwell time, bounce rate from search results
You can have a perfectly formed sitemap and perfect indexing and still never rank if the content isn't competitive. Sitemaps are at the very beginning of the funnel — they can't help you with anything downstream.
The Truth About Priority and changefreq
Google's John Mueller has said multiple times that Google mostly ignores the <priority> tag. It uses its own signals to determine which pages are most important. Setting every page to 1.0 doesn't help — it just makes the signal meaningless across your entire site.
The <changefreq> tag is similarly advisory. Google's crawlers decide recrawl frequency based on how much the page content actually changes between visits, not what you claim in the sitemap. That said, setting it correctly is still worth doing — it adds consistency to your technical SEO hygiene and some crawlers (Bing, smaller bots) may pay more attention to it.
Use realistic values:
| Page type | changefreq | priority |
|---|---|---|
| Homepage | weekly | 1.0 |
| Core landing / product pages | weekly | 0.8–0.9 |
| Blog/news (recent) | daily or weekly | 0.7–0.8 |
| Blog/news (older, stable) | monthly | 0.5–0.6 |
| About / contact / static pages | monthly or yearly | 0.4–0.5 |
| Legal / policy pages | yearly | 0.2–0.3 |
What Not to Put in Your Sitemap
This is where most people go wrong. The natural instinct is to include everything — "more pages in the sitemap = more indexing." The opposite is often true. Including junk URLs signals to Googlebot that you're not curating your own site.
Exclude from your sitemap:
- Pages with
<meta name="robots" content="noindex">— this is a direct contradiction - Pages disallowed in robots.txt — again, contradictory
- Session ID URLs (
/page?session=abc123) - Filtered/sorted URL variations (
?sort=price&color=blue) — use canonical tags on these instead - Paginated pages beyond page 1 in most cases
- Thank-you pages, confirmation pages, checkout pages
- Duplicate content pages — pick the canonical version and only include that
- Pages that redirect — include the final destination URL only
The Robots.txt Interaction
Your sitemap and your robots.txt need to agree. Googlebot reads robots.txt before processing your sitemap. If a URL appears in your sitemap but is blocked in robots.txt, Google sees a contradiction. It may choose not to crawl the page — and it may view the inconsistency as a signal of poor site management.
The classic mistake: adding the entire /admin/ directory to robots.txt to block it, then accidentally leaving some admin-adjacent URLs in the sitemap. Audit both files to make sure they're consistent.
You should also reference your sitemap in your robots.txt so any crawler — not just those reading Search Console — can find it automatically:
User-agent: * Disallow: /admin/ Sitemap: https://yoursite.com/sitemap.xml
Using Search Console to Actually Diagnose Indexing
If your pages aren't indexing, Google Search Console tells you why — and most people don't look closely enough at what it's saying.
In Search Console, go to Indexing → Pages. You'll see a breakdown of why pages are not indexed. The reasons Google gives you:
- Crawled — currently not indexed — Google crawled the page but decided not to index it. Usually a content quality issue.
- Discovered — currently not indexed — Google knows the page exists (from your sitemap or links) but hasn't gotten around to crawling it yet. Often a crawl budget or new site issue.
- Duplicate without user-selected canonical — Google found a near-identical page elsewhere and is ignoring this one. Fix with canonical tags.
- Blocked by robots.txt — Self-explanatory. Remove the block.
- Excluded by noindex tag — You told Google not to index it. If that's wrong, remove the tag.
- Redirect — The URL redirects elsewhere. Update your sitemap to the destination URL.
Every one of these has a clear fix. The answer to most indexing problems is in this report — not in reformatting your sitemap file.
Realistic Timeline Expectations
People expect indexing to happen within days. Here's what's actually normal:
| Site situation | Typical indexing timeline |
|---|---|
| Established site, new page with strong internal links | Hours to 2 days |
| Established site, new page with weak internal links | 1–2 weeks |
| Established site, sitemap-only page (no internal links) | 2–4 weeks |
| New site (under 3 months), with real inbound links | 1–4 weeks |
| New site, no inbound links, sitemap submitted only | 2–6 months (or never) |
| Site with quality issues / thin content | Indefinite / won't index |
If you're on a new domain with no backlinks and you submitted a sitemap yesterday — wait. There's nothing wrong. Build some inbound links, write substantive content, and check back in a few weeks.
The Actual Checklist
If your pages aren't indexing, work through this in order:
- Check Search Console → Indexing → Pages for the specific exclusion reason
- Confirm the page returns a 200 status code (not a redirect or error)
- Confirm no
noindextag exists on the page - Confirm the URL isn't blocked in robots.txt
- Confirm the page has real, substantive content (not thin or duplicate)
- Add at least 2–3 internal links pointing to the page from already-indexed pages
- Get at least one external inbound link if it's a new site
- Submit a clean sitemap (curated, no junk URLs) via Search Console
- Wait. The process takes time.
Steps 1–7 will fix 95% of indexing problems. Step 8 is the finishing touch that most people are treating as the solution.