🗺 SEO · Indexing

Your Sitemap Is Not Getting You Indexed — Here's What Actually Does

June 7, 2026 9 min read SEO · Technical

Here's the call every SEO forum gets at least three times a week: "I submitted my sitemap to Google Search Console two weeks ago and nothing is indexed. What's wrong?"

The honest answer: probably nothing is wrong with your sitemap. A sitemap doesn't get you indexed. It tells Google where to look. Whether Google then chooses to actually visit, crawl, and index those pages depends on a completely separate set of signals — most of which have nothing to do with the sitemap file itself.

This matters because every week people waste time obsessing over sitemap formatting, priority values, and changefreq settings when the real indexing blockers are sitting somewhere else entirely.

What a Sitemap Actually Does (and Doesn't Do)

An XML sitemap is a hint file. You're telling Google: "These URLs exist on my site. Here's how important I think they are relative to each other, and here's how often the content changes."

That's it. Google can accept or reject the hint entirely. Submitting a sitemap doesn't guarantee:

That your pages will be crawled
That crawled pages will be indexed
That indexed pages will rank
That crawling will happen faster than without a sitemap

What a sitemap does genuinely help with:

Discovery on new sites — A brand new site with zero inbound links has no way for Googlebot to find its pages through normal link-following. A sitemap is the only path.
Large sites with weak internal linking — If your site has 50,000 pages but some are three or four clicks deep from the homepage, Googlebot might not get there on its own crawl. A sitemap surfaces those URLs.
Signaling content changes — Updating the lastmod date on changed pages can nudge Googlebot to revisit them sooner. (Though Google says it uses its own signals primarily.)
Orphaned pages — Any page with no inbound links from your own site is invisible to Googlebot without a sitemap.

The mental model that helps: Think of a sitemap like a restaurant menu slid under the door. The restaurant knows the dishes are available. Whether they send a waiter to actually pick it up and visit — that depends on whether they trust the establishment.

Crawl Budget: The Thing Most People Ignore

Googlebot allocates a crawl budget to every site — a rough limit on how many pages it will crawl from your domain within a given period. For large sites, this matters enormously. For small sites (under a few thousand pages), it's rarely the bottleneck.

What wastes crawl budget:

Redirect chains (page A → B → C → D)
Infinite URL parameters (?sort=price&page=2&color=blue&session=abc123)
Duplicate content without canonical tags
Low-value paginated pages (page 47 of a category listing)
Soft 404s — pages that return a 200 status but say "no results found"
Pages in your sitemap that are blocked by robots.txt (contradictory signals)

If your sitemap includes 2,000 URLs but 600 of them are duplicate, low-value, or blocked — Googlebot wastes budget visiting them, which takes time away from your important pages.

Common mistake

Adding every URL your CMS generates to the sitemap — including /tag/whatever, /category/xyz, paginated URLs like ?page=2, and filtered product pages. Only include URLs you genuinely want indexed and that have distinct, non-duplicate content.

What Actually Gets Pages Indexed

Google decides whether to index a page based on a layered set of trust and quality signals. Here's how they stack up:

Inbound links

Links from other crawled pages — both internal and external — are the primary discovery and trust signal.

Very high

Page quality signals

Is the content original, substantive, and useful? Thin content gets deprioritized or not indexed at all.

Very high

Site trust / authority

New domains with no backlinks are crawled cautiously. Google trusts established domains more.

High

Page speed & Core Web Vitals

Extremely slow pages may be deprioritized in crawl scheduling. Not a hard cutoff, but it matters.

Medium

Internal linking structure

Pages that are only reachable via the sitemap and have no internal links are treated as lower priority.

Medium

Sitemap submission

Helps with discovery, particularly for new or orphaned pages. Does not guarantee indexing.

Discovery aid

The pattern is consistent: Google indexes pages it trusts and finds valuable, not just pages you tell it exist. A sitemap is a discovery tool, not a trust-building tool.

New Sites: Why the Wait Is Normal

If your site launched in the last two months and you're waiting on indexing, a lot of what you're experiencing is normal and expected.

Google's crawl priority queue for new domains is conservative. A brand new domain has no track record. Googlebot will initially crawl only a handful of pages, assess the quality of what it finds, and then decide whether to invest more crawl budget. This assessment period can take weeks to months.

Things that help move this along:

Get a real inbound link. A link from an already-indexed site — even a small one — tells Googlebot your site is real and connected to the web. This is the single most effective accelerant for new site indexing.
Build internal links from the start. Every page should be reachable by following links from the homepage in three clicks or fewer.
Have substantive content on day one. Launching a site with placeholder or thin content gets you deprioritized immediately. Googlebot's initial sample of your site will inform how aggressively it crawls going forward.
Verify in Search Console and submit the sitemap there. This gives Google an explicit signal that you're the verified owner and you're actively managing the site.

Indexed ≠ Ranking

A common second-stage confusion: pages get indexed but don't show up in search results for the queries you want.

Indexing just means the page exists in Google's database. Whether it shows up for a specific query depends on:

Relevance — does the page content actually address what the searcher is looking for?
Competition — how many other pages are competing for the same query?
Authority — how much does Google trust your domain and this specific page?
UX signals — click-through rate, dwell time, bounce rate from search results

You can have a perfectly formed sitemap and perfect indexing and still never rank if the content isn't competitive. Sitemaps are at the very beginning of the funnel — they can't help you with anything downstream.

The Truth About Priority and changefreq

Google's John Mueller has said multiple times that Google mostly ignores the <priority> tag. It uses its own signals to determine which pages are most important. Setting every page to 1.0 doesn't help — it just makes the signal meaningless across your entire site.

The <changefreq> tag is similarly advisory. Google's crawlers decide recrawl frequency based on how much the page content actually changes between visits, not what you claim in the sitemap. That said, setting it correctly is still worth doing — it adds consistency to your technical SEO hygiene and some crawlers (Bing, smaller bots) may pay more attention to it.

Use realistic values:

Page type	changefreq	priority
Homepage	weekly	1.0
Core landing / product pages	weekly	0.8–0.9
Blog/news (recent)	daily or weekly	0.7–0.8
Blog/news (older, stable)	monthly	0.5–0.6
About / contact / static pages	monthly or yearly	0.4–0.5
Legal / policy pages	yearly	0.2–0.3

What Not to Put in Your Sitemap

This is where most people go wrong. The natural instinct is to include everything — "more pages in the sitemap = more indexing." The opposite is often true. Including junk URLs signals to Googlebot that you're not curating your own site.

Exclude from your sitemap:

Pages with <meta name="robots" content="noindex"> — this is a direct contradiction
Pages disallowed in robots.txt — again, contradictory
Session ID URLs (/page?session=abc123)
Filtered/sorted URL variations (?sort=price&color=blue) — use canonical tags on these instead
Paginated pages beyond page 1 in most cases
Thank-you pages, confirmation pages, checkout pages
Duplicate content pages — pick the canonical version and only include that
Pages that redirect — include the final destination URL only

Free Tool

XML Sitemap Generator

Build a clean, valid sitemap.xml with per-URL priority, changefreq, and lastmod. Add URLs manually or bulk-import a list — then download or copy in one click. No signup.

Build your sitemap →

The Robots.txt Interaction

Your sitemap and your robots.txt need to agree. Googlebot reads robots.txt before processing your sitemap. If a URL appears in your sitemap but is blocked in robots.txt, Google sees a contradiction. It may choose not to crawl the page — and it may view the inconsistency as a signal of poor site management.

The classic mistake: adding the entire /admin/ directory to robots.txt to block it, then accidentally leaving some admin-adjacent URLs in the sitemap. Audit both files to make sure they're consistent.

You should also reference your sitemap in your robots.txt so any crawler — not just those reading Search Console — can find it automatically:

User-agent: *
Disallow: /admin/

Sitemap: https://yoursite.com/sitemap.xml

Using Search Console to Actually Diagnose Indexing

If your pages aren't indexing, Google Search Console tells you why — and most people don't look closely enough at what it's saying.

In Search Console, go to Indexing → Pages. You'll see a breakdown of why pages are not indexed. The reasons Google gives you:

Crawled — currently not indexed — Google crawled the page but decided not to index it. Usually a content quality issue.
Discovered — currently not indexed — Google knows the page exists (from your sitemap or links) but hasn't gotten around to crawling it yet. Often a crawl budget or new site issue.
Duplicate without user-selected canonical — Google found a near-identical page elsewhere and is ignoring this one. Fix with canonical tags.
Blocked by robots.txt — Self-explanatory. Remove the block.
Excluded by noindex tag — You told Google not to index it. If that's wrong, remove the tag.
Redirect — The URL redirects elsewhere. Update your sitemap to the destination URL.

Every one of these has a clear fix. The answer to most indexing problems is in this report — not in reformatting your sitemap file.

Realistic Timeline Expectations

People expect indexing to happen within days. Here's what's actually normal:

Site situation	Typical indexing timeline
Established site, new page with strong internal links	Hours to 2 days
Established site, new page with weak internal links	1–2 weeks
Established site, sitemap-only page (no internal links)	2–4 weeks
New site (under 3 months), with real inbound links	1–4 weeks
New site, no inbound links, sitemap submitted only	2–6 months (or never)
Site with quality issues / thin content	Indefinite / won't index

If you're on a new domain with no backlinks and you submitted a sitemap yesterday — wait. There's nothing wrong. Build some inbound links, write substantive content, and check back in a few weeks.

The Actual Checklist

If your pages aren't indexing, work through this in order:

Check Search Console → Indexing → Pages for the specific exclusion reason
Confirm the page returns a 200 status code (not a redirect or error)
Confirm no noindex tag exists on the page
Confirm the URL isn't blocked in robots.txt
Confirm the page has real, substantive content (not thin or duplicate)
Add at least 2–3 internal links pointing to the page from already-indexed pages
Get at least one external inbound link if it's a new site
Submit a clean sitemap (curated, no junk URLs) via Search Console
Wait. The process takes time.

Steps 1–7 will fix 95% of indexing problems. Step 8 is the finishing touch that most people are treating as the solution.

Free Tool

Robots.txt Generator

Build a valid robots.txt with allow/disallow rules, crawl delay settings, and sitemap references. Make sure your sitemap and robots.txt agree — they need to be consistent.

Build robots.txt →

✍️

The Tool Empire

We build browser-based tools for SEO, content, and development workflows. No installs, no signup — just tools that work.

Keep reading