Programmatic SEO for API-First CMS: The Complete Guide

6 min read
Cover image for Programmatic SEO for API-First CMS: The Complete Guide

If you've ever tried to scale a content site beyond 50 pages, you've hit the wall. Titles written by hand. Meta descriptions copy-pasted. URLs that nobody agreed on. A sitemap that's three months out of date.

Programmatic SEO solves this but most guides assume you're running WordPress or a static site. If your content lives in an API-first CMS, the approach is fundamentally different. You're not editing templates in a dashboard. You're writing code that reads structured content and turns it into pages that Google can find, understand, and rank.

This guide covers everything: how to model your content for SEO, how to generate metadata automatically, how to handle URLs at scale, and how to make sure Google can actually crawl what you build.


What "Programmatic SEO" Actually Means

Programmatic SEO is the practice of generating a large number of pages from structured data, rather than writing each page by hand. Instead of manually creating a page for every use case, keyword, or user, you write one template and let the data do the work.

Think of it like this: a travel site doesn't hand-write a page for every city. They write one template, pull city data from a database, and generate thousands of pages automatically. Each page is unique because the data is unique.

For an API-first CMS, this applies directly. Your content papers, projects, collections already lives in a database. Programmatic SEO means making sure every piece of that content gets a properly structured, indexable, well-described page without you touching each one individually.


Why API-First CMS Changes the Game

Traditional CMS platforms like WordPress bake SEO into the dashboard. You install a plugin, fill in a box, and you're done.

An API-first CMS doesn't work that way. Your content is returned as raw JSON. The frontend is completely separate usually a framework like Astro, Next.js, or Nuxt. This separation gives you enormous flexibility, but it also means SEO is entirely your responsibility.

Nobody is injecting meta tags for you. No plugin is generating your sitemap. If you don't build it, it doesn't exist.

The upside: you have complete control. You can build a metadata system that's smarter, more consistent, and more scalable than anything a plugin could give you.


Step 1: Model Your Content With SEO in Mind

Before you write a single line of frontend code, your content model needs to support SEO. Every piece of content that will become a public page should have these fields:

Required fields:

  • title the human-readable name of the content
  • slug the URL-safe identifier (lowercase, hyphen-separated, no special characters)
  • excerpt or description a 150–160 character summary used for meta descriptions
  • publishedAt date for freshness signals
  • updatedAt date for recrawl prioritization

Recommended fields:

  • metaTitle optional override for the <title> tag if different from the display title
  • metaDescription optional override for the meta description
  • coverImageUrl + coverImageAlt for OG images and image SEO
  • canonicalUrl for cases where content is syndicated or cross-posted

If your CMS doesn't expose these fields, you'll be deriving them from whatever data you have which works, but is less reliable.


Step 2: Build a Centralized Metadata System

The single biggest mistake on API-first frontends is writing metadata inline on each page. You end up with inconsistent titles, missing descriptions, and no way to enforce standards across the site.

Instead, build one helper call it seo.ts or meta.ts that every page calls to get its metadata. It takes in the raw content object and returns a complete, validated set of SEO fields.

// seo.ts (simplified example)
export function buildMeta(content: ContentItem, pageUrl: string) {
  return {
    title: content.metaTitle || `${content.title} | Whitepaper`,
    description: content.metaDescription || content.excerpt || '',
    canonical: pageUrl,
    ogTitle: content.metaTitle || content.title,
    ogDescription: content.metaDescription || content.excerpt || '',
    ogImage: content.coverImageUrl || '/default-og.png',
  };
}

Every page user profiles, project pages, paper pages calls this function and passes the result to your layout component. Your layout component renders all the <meta> tags in one place.

This means when you need to change how titles are formatted, you change it in one file and it propagates everywhere.


Step 3: Handle URLs Consistently

URL chaos is one of the most common SEO problems on dynamic content sites. The same content ends up accessible at multiple URLs:

  • /john and /John and /@john
  • /my-paper and /My-Paper and /my-paper/

Google treats these as separate pages with duplicate content. This hurts rankings.

The fix is a two-part system:

Normalize at the data layer. When content is created, enforce lowercase slugs and strip special characters. Never store @ prefixes or uppercase letters in slugs.

Redirect at the routing layer. When a request comes in for /John, check if the normalized version is /john, and redirect with a 301. Same for trailing slashes pick one and always redirect to it.

In Astro with a server-side adapter, this looks like:

// [handle]/index.astro
const normalized = handle.toLowerCase().replace(/^@/, '');
if (normalized !== handle) {
  return Astro.redirect(`/${normalized}`, 301);
}

Do this for every dynamic route. It's boring work but it prevents a lot of crawl problems.


Step 4: Generate Your Sitemap Programmatically

A sitemap tells Google what pages exist on your site. For an API-first CMS, you generate it by fetching your content list from the API and formatting it as XML.

The key rules for a good sitemap:

  • Only include pages that are publicly accessible and indexable
  • Include <lastmod> using the content's updatedAt date
  • Split into multiple sitemaps if you have more than 50,000 URLs (use a sitemap index)
  • Reference all sitemaps from your robots.txt

A segmented approach works well: one sitemap for static marketing pages, one for user profile pages, one for project pages. This makes it easier to monitor which segment has indexation problems in Google Search Console.


Step 5: Make Sure Bots Can See Your Content

This is where API-first sites most commonly fail. If your content is loaded by JavaScript after the page renders fetched on click, shown after a user interaction Google may never see it.

For public content that needs to rank, it must be in the initial HTML. This means server-side rendering or static generation, not client-side fetching.

In Astro, this is the default behavior for .astro files. But if you're using React components with client:load, data fetched inside those components is invisible to bots until Google renders your JavaScript which it does, but with a delay and lower priority.

The practical rule: any text, links, or content that matters for SEO should be rendered server-side. Interactive UI (tab switching, accordions, modals) can be client-side. But the underlying content those interactions reveal should already be in the HTML.


Putting It Together

Programmatic SEO on an API-first CMS is not magic. It's a set of boring, reliable systems:

  1. A content model with SEO fields built in
  2. A centralized metadata helper that every page uses
  3. URL normalization and redirect rules
  4. A programmatically generated sitemap
  5. Server-side rendering for all public content

None of these are complex individually. The power is in having all of them working together, consistently, across every page your CMS produces.

When your content model is clean, your metadata is automatic, your URLs are canonical, and your sitemap is always current Google can do its job. And when Google can do its job, your content gets found.


Further Reading

Whitepapper

Whitepapper is an API first CMS engine for devs who publish content.

Antk i n