Stellate: how I cached a GraphQL API without touching the backend

A BFF (Backend for Frontend) on Lambda is stateless by design. Every incoming GraphQL query has to go through Sanity, Shopify, and DynamoDB — three network hops, even if the data hasn’t changed in hours. On a real e-commerce with actual traffic, this means high latency and compute costs that scale linearly with visits.

The standard fix is a Redis layer in front of the origin. But managing Redis means managing another piece of infrastructure: sizing, failover, manual invalidation. I went with Stellate, a GraphQL-specific CDN that acts as a reverse proxy: queries hit Stellate first, which responds from cache if the data is fresh, or forwards to the origin and caches the response.

Where it sits in the architecture

Client (Next.js)
      ↓
  Stellate CDN  ←── cache hit → immediate response
      ↓ cache miss
  BFF (Lambda)
      ↓
  Sanity + Shopify + DynamoDB

The client changes nothing: it still points to the same GraphQL endpoint. Stellate sits in between transparently. The configuration lives in a TypeScript file deployed to the Stellate platform, separate from the BFF code.

Cache configuration

export default defineConfig({
  serviceName: 'my-storefront',
  schema: process.env.STELLATE_SCHEMA,
  originUrl: process.env.STELLATE_ORIGIN_URL,

  rules: [
    // Default: 30 minutes + 2 days stale-while-revalidate
    {
      types: ['Query'],
      maxAge: 1800,
      swr: 172800,
    },
    // User-specific data: never cache
    {
      types: ['SpfCart', 'SpfCustomer', 'SpfAdminInventory'],
      maxAge: 0,
    },
    // Schema: long cache with SWR
    {
      types: ['__Schema'],
      maxAge: 3780,
      swr: 86400,
    },
  ],

  keyFields: {
    Routing: ['slug'],
    SpfProduct: ['id'],
  },

  scopes: [
    {
      scope: 'AUTHENTICATED',
      representation: 'header:authorization',
    },
  ],
});

Three decisions here worth explaining.

maxAge + SWR instead of maxAge alone. maxAge: 1800 means Stellate responds from cache for 30 minutes without touching the origin. swr: 172800 adds 2 days of stale-while-revalidate: after 30 minutes, Stellate keeps serving the stale data while updating the cache in the background. The client never waits for the origin — at worst it gets data that’s a few seconds old.

Non-cacheable types. SpfCart and SpfCustomer have maxAge: 0 — Stellate bypasses them entirely and always goes to the origin. Cart and customer are user-specific: caching them risks showing one user another user’s cart.

Key fields for granular invalidation. Routing: ['slug'] tells Stellate how to uniquely identify a record of that type. This is needed for selective purging — more on that below.

Separating cache for authenticated users

The AUTHENTICATED scope with representation: 'header:authorization' tells Stellate to maintain two separate caches: one for requests without an authorization header (anonymous users), one for those with one.

Without this, an authenticated user would see an anonymous user’s cache and vice versa. In practice, most product and category queries can be shared across all users — but anything that touches personalized data (wishlists, B2B pricing, market-specific availability) goes into the authenticated cache.

The invalidation cycle

The problem with caching is always stale data. When an editor publishes a change in Sanity, the Stellate cache still holds the old data — until maxAge expires. For an e-commerce store, a 30-minute delay on a price or availability change is unacceptable.

I built an event-driven invalidation cycle:

Editor publishes in Sanity
        ↓
Sanity sends webhook POST (HMAC-signed)
        ↓
Lambda validates signature and puts document into SQS FIFO
        ↓
SQS waits 45 seconds (aggregates bursts of publications)
        ↓
Consumer Lambda processes up to 10 messages
        ↓
Queries routing table → gets slug and typename for each document
        ↓
Promise.all([
  stellatePurgeType(typename, documentId),
  stellatePurgeType('Routing', slug),
  storefrontISRRevalidate(slugs),
])

The 45-second delay on the queue is intentional: if an editor saves a product 5 times in quick succession, a single consumer Lambda processes all messages in one batch instead of triggering 5 separate purges.

Selective purge by type

Stellate exposes a GraphQL Purge API. Instead of _purgeAll() which invalidates the entire cache, I use _purgeType() with key fields:

async function stellatePurgeType(
  type: string,
  keyFields: { name: string; value: string }[],
  soft = false
): Promise<void> {
  await fetch(process.env.STELLATE_PURGE_ENDPOINT!, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'stellate-token': process.env.STELLATE_PURGE_TOKEN!,
    },
    body: JSON.stringify({
      query: `
        mutation PurgeType($type: String!, $keyFields: [KeyFieldInput!], $soft: Boolean) {
          _purgeType(type: $type, keyFields: $keyFields, soft: $soft)
        }
      `,
      variables: { type, keyFields, soft },
    }),
  });
}

When a product changes, I only purge that product:

await Promise.all([
  // Purge the product by ID
  stellatePurgeType('SpfProduct', [{ name: 'id', value: shopifyProductId }]),
  // Purge its route by slug
  stellatePurgeType('Routing', [{ name: 'slug', value: productSlug }]),
]);

The soft flag in the mutation corresponds to SWR behavior: with soft: true, the record is marked stale but remains servable while Stellate revalidates in the background. With soft: false (default), the record is removed and the next request must go to the origin.

I used soft purge for most cases — I’d rather serve a 30-second stale response than make the next client wait while Lambda fetches from Sanity and Shopify.

What I learned

The latency gain is immediate and measurable. Queries that previously took 300–800ms (Lambda round-trip + Sanity + Shopify) drop to 20–50ms from the edge cache. The trade-off is accepting that data can be stale within the SWR window.

Getting non-cacheable types right is critical. Missing maxAge: 0 on SpfCart means two users could see the same cart. I caught this in staging by inspecting responses: the cartId in the response was identical across different requests.

Selective purge requires a routing table. To know which slug to invalidate when a Sanity document changes, I need a table mapping documentId → slug. Without it, the only option is _purgeAll() — effective but blunt. It invalidates everything, including data that hasn’t changed.

SQS delay solves CMS edit bursts. Without the 45-second delay, an editing session with 10 rapid saves generated 10 sequential Purge API calls. With the delay, the consumer Lambda sees a batch and makes a single multi-document purge call.