What is robots.txt and why does it matter?

Robots.txt is a text file at the root of your website that tells search engine crawlers which pages they can and cannot access. A misconfigured robots.txt can accidentally block important pages from being indexed, making them invisible in search results.

What is a canonical URL?

A canonical URL is the preferred version of a page when duplicate or near-duplicate versions exist. The rel='canonical' tag tells search engines which URL to index and rank, consolidating link equity instead of splitting it across duplicates.

How do I check if my pages are indexed by Google?

Use the site: operator in Google (e.g., site:yoursite.com/page-slug) to check if a specific page is indexed. For a full view, check Google Search Console's Index Coverage report, which shows indexed, excluded, and errored pages.

What is an XML sitemap?

An XML sitemap is a file that lists all the important URLs on your site. It helps search engines discover pages they might miss through normal crawling, especially new pages, deep pages, or pages with few internal links.

Do I need technical SEO if my site is on WordPress or Squarespace?

Yes. While platforms like WordPress and Squarespace handle some technical basics automatically, they don't cover everything. You still need to check robots.txt configuration, canonical tags on paginated content, XML sitemap completeness, and crawl errors in Search Console.

Technical SEO Basics: A Beginner's Guide

Q: What is technical SEO?

Technical SEO refers to optimizations that help search engines crawl, index, and render your website. It covers server configuration, site architecture, robots.txt, XML sitemaps, canonical tags, page speed, mobile-friendliness, and structured data. Unlike on-page SEO (content optimization), technical SEO focuses on infrastructure.

You can write the best content on the internet and still rank nowhere if search engines can't crawl, index, or understand your pages. Technical SEO is the infrastructure layer that makes everything else work.

This guide covers the four technical SEO fundamentals that have the biggest impact on rankings: crawlability, indexability, discoverability, and link equity. Each one includes a specific thing to check and a free tool to audit it. For the content-level optimizations, see our on-page SEO checklist.

What is technical SEO and why should you care?

Technical SEO is the set of optimizations that help search engines access, crawl, render, and index your website. It has nothing to do with your content quality or keyword strategy. It's about whether Google can physically reach your pages and understand their structure.

Think of it this way: on-page SEO is the menu at a restaurant. Technical SEO is whether the restaurant has a door, an address, and shows up on the map. Without the door, nobody reads the menu.

According to a Botify crawl analysis, search engines fail to crawl 51% of pages on enterprise websites. For smaller sites the number is lower, but even a few blocked pages can mean missed rankings.

Crawlability: can search engines reach your pages?

Crawlability is about access. Before Google can rank a page, its crawler (Googlebot) needs permission and a path to reach it.

Check your robots.txt

Your robots.txt file sits at yoursite.com/robots.txt and tells crawlers which URLs they can and cannot access. A single misplaced Disallow rule can block an entire section of your site from ever appearing in search results.

Common mistakes:

Disallow: / blocks your entire site (sometimes left from a staging environment)
Blocking CSS and JavaScript files prevents Google from rendering your pages properly
Blocking parameter URLs that are actually unique content pages

Generate a properly configured robots.txt with our free robots.txt generator. It walks you through the rules so you don't accidentally block important pages.

Fix crawl errors

Check Google Search Console's Crawl Stats report for server errors (5xx), redirect chains, and DNS failures. These prevent Googlebot from accessing your pages even when robots.txt allows it.

Indexability: will search engines store your pages?

Just because Google can crawl a page doesn't mean it will index it. Indexability depends on canonical tags, meta robots directives, and content quality signals.

Set canonical URLs correctly

Duplicate content confuses search engines. When multiple URLs serve the same (or very similar) content, Google has to guess which version to index. The rel="canonical" tag removes the guesswork by declaring the preferred URL.

When you need canonicals:

Product pages accessible via multiple category paths
Pages with tracking parameters (UTMs, session IDs)
HTTP vs HTTPS or www vs non-www versions
Paginated content (page 1 should typically be the canonical)

Audit your canonical tags with our free canonical URL checker. It flags missing canonicals, self-referencing issues, and mismatches between the canonical and the actual URL.

Check meta robots tags

A <meta name="robots" content="noindex"> tag tells search engines not to index a page. This is useful for thank-you pages, admin areas, and staging content, but disastrous when accidentally applied to important pages. Search your site's HTML for "noindex" and make sure it's only on pages you genuinely want excluded.

Discoverability: can search engines find all your pages?

Even with perfect crawlability and indexability, Google might miss pages that are buried deep in your site architecture or have no internal links pointing to them.

Submit and validate your XML sitemap

An XML sitemap is a roadmap for search engines. It lists every important URL on your site, along with metadata like last-modified dates and priority hints. Submit yours through Google Search Console to ensure all pages are discoverable.

What to check:

Does your sitemap include every indexable page? Compare the URL count against your site's actual page count.
Are non-indexable pages excluded? Don't list pages with noindex tags, redirects, or canonical tags pointing elsewhere.
Is the sitemap automatically updated when you publish new content?
Is the sitemap referenced in your robots.txt?

Validate your sitemap for errors with our free XML sitemap validator.

Fix orphan pages

An orphan page has zero internal links pointing to it. Search engines discover pages by following links, so orphans are effectively hidden. Audit your internal linking structure and make sure every important page has at least 2-3 internal links from related content. Our comprehensive SEO guide covers internal linking strategy in depth.

Validate Your XML Sitemap for Free →

Link equity: is authority flowing to the right pages?

Link equity (sometimes called "link juice") is the ranking value passed from one page to another through hyperlinks. How you distribute that value across your site affects which pages rank.

Audit your anchor text

Anchor text, the clickable text of a hyperlink, signals to search engines what the linked page is about. If every internal link to your pricing page says "click here," Google learns nothing about that page's topic.

Best practices:

Use descriptive anchor text that includes relevant keywords naturally
Vary your anchor text: don't use the exact same phrase every time
Avoid generic anchors like "read more," "click here," or "learn more" for important pages
Match anchor text to the target page's primary topic

Check your anchor text distribution with our free anchor text analyzer.

Fix broken internal links

Broken internal links waste crawl budget and leak link equity into 404 pages. Run a crawl of your site (Screaming Frog's free version handles up to 500 URLs) and fix or redirect any broken links. For developer-focused SEO guidance, check out our SEO guide for developers.

A simple technical SEO audit checklist

Area	Check	Free Tool
Crawlability	robots.txt not blocking important pages	Robots.txt Generator
Indexability	Canonical URLs set on all pages	Canonical URL Checker
Discoverability	XML sitemap valid and submitted	Sitemap Validator
Link equity	Descriptive, varied anchor text	Anchor Text Analyzer
Rendering	No blocked CSS/JS in robots.txt	Google Search Console
Speed	LCP under 2.5s, CLS under 0.1	Google PageSpeed Insights
Mobile	Responsive, 48px+ tap targets	Google Mobile-Friendly Test

Where to start if you're overwhelmed

Don't try to fix everything at once. Prioritize by impact:

Check robots.txt first. If important pages are blocked, nothing else matters. Takes 2 minutes.
Submit your XML sitemap to Google Search Console if you haven't already. Ensures Google knows about all your pages.
Fix canonical tags on your highest-traffic pages. Prevents ranking dilution from duplicate content.
Audit anchor text on your top 10 internal links. Make sure they describe the target page, not just say "click here."

Technical SEO isn't a one-time project. Set a quarterly reminder to re-run these checks, especially after site updates, redesigns, or migrations. The tools listed above are all free and take seconds to run.

For content-level optimizations to pair with your technical foundation, work through our on-page SEO checklist. And if you have an FAQ section on your site, make sure it's generating schema markup for rich results.

Sergei Davidov

Sergei Davidov is a Growth Manager at Common Ninja with nearly a decade of experience spanning content strategy, SEO, conversion optimization, and business development. He's helped launch products, optimize funnels, and build marketing systems across e-commerce and SaaS. When he's not dissecting funnel metrics, he writes fiction and experiments in the kitchen.

Join Our Newsletter!

Technical SEO Basics: Crawling, Indexing, and Fixing What Matters

Sergei Davidov, Jan 08, 2026