SEO Fundamentals: Complete On-Page Guide

01 Foundation

The <head> — Where SEO Begins

Everything in <head> is invisible to your user but essential to search engines, browsers, and social platforms. None of it shows on screen — all of it shapes how your page is discovered, indexed, and displayed.

The head is where you declare the page's identity, control how crawlers behave, specify what appears when your link is shared, and load resources without hurting performance. Getting these tags right is table stakes. Getting them wrong wastes everything else.

◆ Every tag in this section is implemented live in the <head> of this page. Open DevTools → Elements and follow along.

The minimal viable head

index.html — minimum required HTML

<head>
  <!-- 1. Character encoding — ALWAYS first -->
  <meta charset="UTF-8">

  <!-- 2. Mobile-first viewport -->
  <meta name="viewport" content="width=device-width, initial-scale=1.0">

  <!-- 3. The most important SEO tag -->
  <title>Primary Keyword — Secondary | Brand</title>

  <!-- 4. Snippet shown in search results -->
  <meta name="description" content="150–160 chars. Describe the page, include a CTA.">

  <!-- 5. Canonical — the authoritative URL -->
  <link rel="canonical" href="https://yourdomain.com/this-page/">
</head>

02 On-Page

Title Tag & Meta Description

The title tag is the single most impactful on-page SEO element. It appears as the clickable blue headline in every search result. Google may rewrite it if it finds yours misleading or too short — write it well and it stays yours.

Title tag rules

Rule	Correct	Wrong
Length	50–60 characters	80+ chars (truncated in SERP)
Keyword position	Primary keyword first	Brand name first, keyword last
Uniqueness	Unique per page	Same title on all pages
Content match	Accurately describes the page	Clickbait that misleads users

Meta description rules

△ Meta descriptions do NOT directly affect rankings. They affect click-through rate (CTR). Higher CTR → more traffic → higher rankings indirectly. Write them for humans, not algorithms.

index.html HTML

<!-- GOOD — descriptive, keyword-aware, has implicit CTA -->
<meta name="description"
      content="A complete, practical guide to SEO fundamentals —
               meta tags, structured data, robots.txt, sitemaps,
               and more — all implemented live on this very page.">

<!-- BAD — vague, no keywords, no reason to click -->
<meta name="description" content="Welcome to our website.">

03 Technical SEO

Canonical URL

When the same content is accessible at multiple URLs, Google distributes your ranking signals (backlinks, traffic) across all of them instead of consolidating them on one. This dilutes your authority. The canonical tag says: "Ignore the variants. This is the definitive URL."

Common causes of duplicate URLs

duplicate URL patterns URLs

## All of these might serve identical content:

https://example.com/page          # clean URL
https://example.com/page/         # trailing slash variant
https://www.example.com/page      # www vs non-www
http://example.com/page           # http vs https
https://example.com/page?utm_source=email  # UTM parameter
https://example.com/page?ref=homepage      # referral parameter
https://example.com/page?sort=asc          # filter/sort parameter

index.html HTML

<!-- Place in <head>. Always use absolute URL with https:// -->
<!-- ⚑ Replace with YOUR production URL -->
<link rel="canonical"
      href="https://yourdomain.com/this-page/">

ⓘ Self-referential canonicals (a page pointing to itself) are best practice even when you have no duplicate URL issue. They act as a safety net and signal confidence to Google.

04 Social SEO

Open Graph Protocol

Open Graph (OG) was created by Facebook and is now used by LinkedIn, Slack, iMessage, WhatsApp, Discord, and most platforms that generate link previews. Without OG tags, the platform guesses — and usually gets it wrong.

△ The code examples below use yourdomain.com as a placeholder. Swap every instance with your actual production URL before deploying. The live <head> of this page already uses the correct URL — view source to see the real implementation.

index.html — Open Graph for a standard webpage HTML

<!-- Use og:type="website" for homepages and standard pages -->
<!-- ⚑ Replace yourdomain.com with your actual URL throughout -->
<meta property="og:type"        content="website">
<meta property="og:title"       content="Your Page Title Here">
<meta property="og:description"  content="Describe the page in 1–2 sentences.">
<meta property="og:image"        content="https://yourdomain.com/og-image.jpg">
<meta property="og:url"          content="https://yourdomain.com/this-page/">
<meta property="og:site_name"    content="Your Brand Name">

index.html — Open Graph for a blog post or article HTML

<!-- Use og:type="article" for blog posts and news pages -->
<!-- IMPORTANT: a page can only have ONE og:type value.      -->
<!-- Choose "website" OR "article" — never both.            -->
<meta property="og:type"               content="article">
<meta property="og:title"              content="Your Article Title">
<meta property="og:description"         content="Article summary here.">
<meta property="og:image"               content="https://yourdomain.com/og-image.jpg">
<meta property="article:published_time" content="2026-02-10T00:00:00Z">
<meta property="article:author"         content="https://twitter.com/yourhandle">

◆ og:image should be 1200×630px minimum (2:1 ratio). Use an absolute URL — relative paths do not work. Host it on a stable, fast URL. Image changes take time to propagate due to platform caching.

05 Social SEO

Twitter Card

Twitter (X) uses its own meta tags. If absent, it falls back to Open Graph, but the fallback rendering is less reliable. Explicitly set Twitter Card tags for consistent display.

index.html HTML

<!-- summary_large_image = full-width banner card (recommended) -->
<!-- summary            = small thumbnail on the left           -->
<!-- app                = for app download cards                -->
<!-- player             = for video/audio cards                 -->

<!-- ⚑ Replace yourdomain.com and @yourhandle with your own -->
<meta name="twitter:card"        content="summary_large_image">
<meta name="twitter:title"       content="Your Page Title Here">
<meta name="twitter:description"  content="Short summary for the Twitter card.">
<meta name="twitter:image"        content="https://yourdomain.com/og-image.jpg">
<meta name="twitter:site"         content="@yourhandle">  <!-- optional -->

06 Technical SEO

Structured Data & JSON-LD

Structured data gives Google machine-readable facts about your content. In return, Google can display rich results — enhanced SERP listings with star ratings, FAQs, product prices, event dates, and more. Rich results can dramatically increase CTR.

Schema Type	Rich Result	Best for
Article	Top stories carousel, date	Blog posts, news
Product	Price, availability, reviews	E-commerce
FAQPage	Expandable Q&A in SERP	FAQ sections
LocalBusiness	Map, hours, phone	Brick & mortar
BreadcrumbList	Path shown under title	Multi-level sites
Event	Date, location, tickets	Events, concerts
JobPosting	Job listing card	Career pages
Person	Knowledge panel	Personal/portfolio
WebSite	Sitelinks searchbox	Homepage

JSON-LD format (recommended)

△ The template below uses yourdomain.com placeholders. The live <head> of this page uses the real production URLs — view source to see the actual implementation. Also note: if you don't have a dedicated logo image yet, omit the publisher.logo property. Using your article image as the logo will cause a warning in Google's Rich Results Test.

index.html — JSON-LD Article schema template JSON-LD

<!-- ⚑ Replace all yourdomain.com values with your own URLs -->
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your Article Headline Here",
  "description": "A short description of the article.",
  "image": "https://yourdomain.com/og-image.jpg",
  "url": "https://yourdomain.com/",
  "datePublished": "2026-01-01",
  "dateModified": "2026-01-01",
  "author": {
    "@type": "Person",
    "name": "Your Name",
    "url": "https://yourdomain.com/"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Your Organisation Name",
    "logo": {
      "@type": "ImageObject",
      "url": "https://yourdomain.com/logo.png"
      // logo should be a dedicated image, NOT the article image
      // recommended: rectangular, ~60px tall, on a white background
      // if you don't have a logo yet, delete this entire "logo" block
    }
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://yourdomain.com/"
  }
}
</script>

◆ Test your structured data at schema.org/validator or Google's Rich Results Test: search.google.com/test/rich-results. Fix all errors before deploying.

07 Technical SEO

robots.txt — Controlling Crawlers

The robots.txt file lives at the root of your domain (yourdomain.com/robots.txt) and tells search engine crawlers which parts of your site they can and cannot access. It is the first file most crawlers fetch when they visit your site.

⚠ robots.txt is a REQUEST, not a block. Reputable crawlers (Googlebot) respect it. Malicious bots often ignore it. Use server-side access controls or noindex meta tags for truly sensitive pages.

robots.txt — anatomy and examples TEXT

# User-agent specifies WHICH crawler this rule applies to
# * means ALL crawlers
User-agent: *
Allow: /                    # allow everything by default
Disallow: /admin/           # block admin panel
Disallow: /api/             # block API endpoints
Disallow: /checkout/        # block checkout pages
Disallow: /search?          # block internal search results

# ⚑ Point to YOUR sitemap URL — not yourdomain.com
Sitemap: https://yourdomain.com/sitemap.xml

---

# Crawl-delay: how many seconds to wait between requests
# Use if your server can't handle heavy crawler traffic
User-agent: Googlebot
Crawl-delay: 2

---

# Block a specific crawler entirely (e.g. AI training scrapers)
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

△ Critical: your robots.txt must use your PRODUCTION domain, not localhost. A misconfigured robots.txt pointing to localhost is one of the most common SEO mistakes on dev-deployed sites.

robots.txt — this project (production-ready) TEXT

User-agent: *
Allow: /
Sitemap: https://risa-source.github.io/BasicSEOImprovement/sitemap.xml

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

# Wrong (what NOT to do — this would prevent Google finding your sitemap):
# Sitemap: http://localhost/Workshop11/sitemap.xml

08 Technical SEO

XML Sitemap

A sitemap is a file that lists all the pages on your site you want search engines to index. It doesn't guarantee indexing, but it dramatically speeds up discovery — especially for large sites, new sites, or pages with few inbound links.

sitemap.xml — full annotated example XML

<?xml version="1.0" encoding="UTF-8"?>
<!-- ⚑ Replace yourdomain.com with your actual production URL -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

  <url>
    <!-- loc: the canonical, absolute URL of the page -->
    <loc>https://yourdomain.com/</loc>

    <!-- lastmod: when the page content was last changed (YYYY-MM-DD) -->
    <lastmod>2026-02-10</lastmod>

    <!-- changefreq: hint to crawlers — always|daily|weekly|monthly|never -->
    <!-- Google largely ignores this, but it costs nothing to include     -->
    <changefreq>monthly</changefreq>

    <!-- priority: relative importance 0.0–1.0 within YOUR site only -->
    <!-- Does not affect how Google ranks you vs other sites           -->
    <priority>1.0</priority>
  </url>

  <url>
    <loc>https://yourdomain.com/about/</loc>
    <lastmod>2026-01-20</lastmod>
    <changefreq>yearly</changefreq>
    <priority>0.7</priority>
  </url>

</urlset>

ⓘ Submit your sitemap in Google Search Console: Settings → Sitemaps. This is the fastest way to get new content indexed. A sitemap reference in robots.txt is secondary but also important.

What to exclude from your sitemap

✗ Pages with noindex meta robot tag
✗ Paginated archive pages (e.g. /page/2/, /page/3/)
✗ Thank-you pages, login pages, checkout pages
✗ Duplicate content URLs (use canonical instead)
✗ URLs that return non-200 HTTP status codes
✗ Disallowed URLs in robots.txt

09 Content SEO

Heading Hierarchy & Content Structure

Search engines use your heading structure as an outline of the page's topic and subtopics. A clear, logical hierarchy helps Google understand what the page is about and which parts are most important. It also drastically improves readability for humans and screen reader users.

index.html — correct heading hierarchy HTML

<!-- ONE h1 per page — the primary topic -->
<h1>SEO Fundamentals: Complete On-Page Guide</h1>

  <!-- h2 — major sections of the page -->
  <h2>The <head> — Where SEO Begins</h2>

    <!-- h3 — subsections within an h2 -->
    <h3>The minimal viable head</h3>

      <!-- h4 — subsections within an h3, use sparingly -->
      <h4>Character encoding</h4>

<!-- NEVER skip levels: h1 → h3 (skipped h2) is wrong -->
<!-- NEVER use headings for visual size only — use CSS  -->

Semantic HTML elements

Element	Semantic meaning	SEO signal
`<article>`	Self-contained piece of content	Strong — marks primary content
`<section>`	Thematic grouping with a heading	Good — organises topics
`<nav>`	Navigation links	Site structure signal
`<main>`	Page's primary unique content	Identifies main content
`<header>`	Introductory / branding content	Structural context
`<footer>`	Supplementary / legal content	Lower content weight
`<aside>`	Tangentially related content	Lower content weight
`<div>`	No meaning	Zero SEO signal

10 On-Page

Image Optimisation

Images are the leading cause of slow page loads and poor Core Web Vitals scores. They're also an independent channel for traffic via Google Image Search. Getting image SEO right means better rankings AND a faster site.

index.html — fully optimised image HTML

<!-- BAD — the original version -->
<img src="big-image.jpg">
<!-- No alt text: invisible to screen readers, no keyword signal     -->
<!-- No dimensions: causes layout shift (hurts CLS Core Web Vital)   -->
<!-- No loading attr: blocks rendering of below-fold content         -->

<!-- GOOD — fully optimised -->
<figure>
  <img
    src="optimized-image.jpg"
    alt="Modern office workspace with natural light and standing desks"
    <!-- alt: describe the image for screen readers AND image search  -->
    <!-- Do NOT: alt="image of office" or alt="keyword keyword office" -->

    width="800"
    height="600"
    <!-- Explicit dimensions prevent Cumulative Layout Shift (CLS)     -->
    <!-- Browser reserves space before image loads → no jumping layout -->

    loading="lazy"
    <!-- Defer off-screen images until user scrolls near them          -->
    <!-- Use loading="eager" for the hero/above-fold image only        -->

    decoding="async"
    <!-- Decode image off the main thread → doesn't block rendering    -->
  >
  <figcaption>Our office workspace — built for deep work.</figcaption>
</figure>

◆ For your hero/above-fold image: use loading="eager" and add <link rel="preload" as="image"> in <head>. This is the single biggest LCP improvement for image-heavy pages.

11 Technical SEO

Performance & Core Web Vitals

Since 2021, Google uses Core Web Vitals (CWV) as a confirmed ranking signal under the "Page Experience" update. These are real-world UX measurements captured in Chrome user data. They reflect how your page actually feels to use — not just how fast it theoretically loads.

LCP

Largest Contentful Paint

How long until the largest visible element (hero image, headline) renders. Measures perceived load speed.

Good: < 2.5s

CLS

Cumulative Layout Shift

How much page elements unexpectedly jump around during load. Caused by images without dimensions, late-loading fonts.

Good: < 0.1

INP

Interaction to Next Paint

How quickly the page responds to user interaction (click, tap, key press). Replaced FID in March 2024.

Good: < 200ms

Performance techniques used on this page

index.html <head> — performance annotations HTML

<!-- 1. dns-prefetch: resolve domain name early (~20ms saved) -->
<link rel="dns-prefetch" href="//fonts.googleapis.com">

<!-- 2. preconnect: DNS + TCP + TLS early (~150ms saved)      -->
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>

<!-- 3. font-display:swap prevents invisible text during load  -->
<link href="https://fonts.googleapis.com/css2?family=...&display=swap" ...>

<!-- 4. Non-render-blocking CSS + FOUC prevention            -->
<!-- Step A: hide body instantly via inline style            -->
<style>body { opacity: 0; } body.css-ready { opacity: 1; transition: opacity 0.2s ease; }</style>
<noscript><style>body { opacity: 1; }</style></noscript>

<!-- Step B: load CSS async, then reveal body once ready     -->
<link rel="preload" href="style.css" as="style"
      onload="this.onload=null;this.rel='stylesheet';document.body.classList.add('css-ready')">
<noscript><link rel="stylesheet" href="style.css"></noscript>

<!-- 5. Explicit image dimensions prevent CLS                -->
<img width="800" height="600" loading="lazy" decoding="async" ...>

12 Reference

Complete SEO Checklist

Use this before every page goes live.

Head tags

✓ charset="UTF-8" is the first tag in <head>
✓ Viewport meta tag with width=device-width
✓ Unique <title> 50–60 chars, primary keyword first
✓ Unique meta description 150–160 chars with call to action
✓ Canonical link tag with absolute production URL
✓ meta robots content="index, follow" (or noindex where needed)
✓ Full Open Graph tags: type, title, description, image, url, site_name
✓ Twitter Card tags: card, title, description, image — twitter:site is optional
✓ JSON-LD structured data matching the page content type
✓ Favicon (SVG preferred, data URI avoids extra request)

Content & markup

✓ Exactly one <h1> containing the primary keyword
✓ Logical heading hierarchy h1 → h2 → h3 (no skipped levels)
✓ Semantic landmark elements: <header>, <main>, <nav>, <footer>
✓ lang attribute on <html> element
✓ All images have descriptive alt text
✓ All images have explicit width and height attributes
✓ loading="lazy" on below-fold images
✓ loading="eager" on above-fold/hero image

Technical files

✓ robots.txt exists at domain root with production URL
✓ robots.txt Sitemap directive points to production sitemap URL
✓ sitemap.xml exists with all indexable pages
✓ sitemap.xml submitted to Google Search Console
✓ All sitemap URLs return HTTP 200
✓ No noindex pages are included in the sitemap

Performance

✓ LCP < 2.5s (test with PageSpeed Insights)
✓ CLS < 0.1 (all images and embeds have dimensions)
✓ INP < 200ms (minimal JS blocking main thread)
✓ CSS loaded non-render-blocking (preload trick) + FOUC prevented (opacity:0 on body until css-ready)
✓ Google Fonts use display=swap
✓ preconnect hints for critical external origins