Published on: December 17, 2022

Edge/ISR vs Serverless Functions: Compose, Don’t Choose

Edge runtimes shine for latency‑sensitive gates, request shaping, and light per‑user personalization. Serverless functions carry the heavy, CPU/memory‑bound work and fan‑out orchestration. Don’t pick a side: compose the two where each is objectively best.

Decision matrix: CPU, state, latency, compliance, vendor constraints

CPU & memory profile: If peak CPU > 50–100ms per request or memory > 128–256MB, push to functions. Edge is optimized for ultra‑low latency, not sustained compute. Long CPU on the edge will starve the scheduler and inflate p95 TTFB; it’s a trap we all fell into at least once.
State & data access: Edge prefers read‑mostly, pre‑materialized state (KV, CDN, read replicas). Functions handle transactional state, multi‑step IO, idempotency keys and complex retries. Avoid cross‑region strong consitency expectations at the edge; prefer eventual semantics.
Latency & geography: Edge excels for geo‑aware routing, AB bucketing, cookie/headers enrichment, and early hints. If you need sub‑50ms decisions globally, keep logic thin and colocated with POPs. Heavy logic goes behind a regional function with aggressive caching.
Compliance & data residency: PII gating and geo‑fences fit nicely at the edge. Heavy processing of PII often belongs in controlled regions with auditable functions and VPC egress.
Vendor/runtime constraints: Web‑standard APIs differ (crypto, streams), native modules aren’t available at the edge, and timeouts are stricter. Don’t oversimply: test exact limits per provider and per region; quotas do vary over time.

Opinion: The matrix prevents bikeshedding and aligns teams on objective thresholds. Simple, efficace 🚀

Composed patterns

Edge gate + function compute: Edge middleware does identity/context extraction, fast canary/AB, feature flag resolution, and cache key shaping. It forwards a compact context to a function that performs scoring, pricing, recommendations, or ML post‑processing.
Edge hints for caching: Use Cache-Control, Vary, Early‑Hints, and custom headers to prime multi‑layer caches. Push low‑entropy personalization (segment, tier, locale) to the edge; keep high‑entropy artifacts (full user model) in function‑side caches.
Interruptible orchestration: The edge returns an immediate skeleton or ISR shell while the function continues compute asynchronously via queues or background revalidation. Users get perceived perf, compute happens where it’s cheaper and safer.

Note: Getting the envelope of what stays at the edge vs function right is the hardest part initially; once stable, it rarely moves. Hard the first week, smooth after 💪

Multi‑level caches and invalidation strategies

Layers: Browser cache → CDN/edge cache → KV/LRU near edge → regional function cache → database/materialized views. Each layer must have explicit ownership and TTL policy.
TTL design: Combine hard TTL with soft TTL and request coalescing to avoid stampedes. Use background refresh tokens and circuit breakers when upstream is degraded. A tiny mis‑tuned soft TTL will explode stampede rate under burst trafic.
Keys and cardinality: Design cache keys from normalized context: market, segment, feature flags, device class. Avoid high‑cardinality keys at the edge; push those to regional function caches.
Invalidation: Prefer data‑driven invalidation (webhooks/change‑data‑capture) over manual ops. For ISR, coalesce revalidations and bound concurrency. Keep invalidation idempotent and replay‑safe.

Opinion: Cache policy is where most systems secretly bleed. Be ruthless about stampede controls and observability on hit/miss. Quite tedius to tune, but worth it 😅

Unified observability across edge and functions

Correlated tracing: Propagate a stable trace-id from the edge to functions, queues, and DB. Use W3C Trace Context; no home‑grown format. Sample adaptively by route and error rate.
Metrics parity: Emit p50/p95/p99 latency, CPU time, cold‑start count, cache hit ratio, revalidation queue depth. Track per‑POP and per‑region breakdowns to catch load‑balancer skew and hot shards.
Structured logs: Log compact JSON with redaction at the edge. Enforce field contracts so dashboards don’t drift. Make PII handling auditable.
SLOs: Separate user‑visible SLOs (Edge TTFB) from backplane SLOs (function CPU time, queue age). Tie alerts to error budgets, not single spikes.

Opinion: Getting a clean trace from POP to function to DB is crazy satisfying when you’re chasing a 2am p99 regression. Fun when it clicks 🤓

Cost/quotas and guardrails

Concurrency caps: Edge has enormous concurrency but small CPU slices; functions have stricter concurrency and cold starts. Shape traffic at the edge: reject, shed load, or down‑tier features before hitting function concurrency walls.
Timeouts & retries: Centralize retry budgets and backoff policy. Use idempotency keys for anything non‑read. Prevent retry storms across layers.
Egress & data transfer: Coalesce requests at the edge, compress aggressively, and avoid chatty edge↔function hops. Monitor egress per region; it bites later.
Guardrails: Feature flags to disable ML scoring, reduce personalization entropy, or switch to static fallbacks. Quotas as config, not code.

Opinion: Budgets and quotas force clarity. They also surface lazy assumptions early. Slightly boring to wire, but saves real money 💤

Minimal edge middleware + function proxy

// edge-middleware.ts (Next.js Middleware)
import type { NextRequest } from 'next/server';
import { NextResponse } from 'next/server';

export const config = { matcher: ['/((?!_next|api|static).*)'] };

export function middleware(request: NextRequest) {
  const segment = request.cookies.get('seg')?.value ?? 'public';
  const locale = request.headers.get('accept-language')?.split(',')[0] ?? 'en';

  const url = new URL('/api/compute', request.url);
  url.searchParams.set('seg', segment);
  url.searchParams.set('loc', locale);

  return NextResponse.rewrite(url, {
    headers: {
      'x-cache-vary': `seg=${segment};loc=${locale}`,
      'x-trace-id': request.headers.get('x-trace-id') ?? crypto.randomUUID(),
    },
  });
}

// pages/api/compute.ts (Serverless Function)
import type { NextApiRequest, NextApiResponse } from 'next';

export default async function handler(req: NextApiRequest, res: NextApiResponse) {
  const { seg, loc } = req.query;
  // TODO: read-through cache with soft TTL and coalescing
  const result = await heavyScoringCompute({ segment: String(seg), locale: String(loc) });

  res.setHeader('Cache-Control', 's-maxage=60, stale-while-revalidate=300');
  res.setHeader('Vary', 'x-cache-vary');
  res.status(200).json({ result });
}

async function heavyScoringCompute(_input: { segment: string; locale: string; }): Promise<number> {
  // pretend CPU-heavy work
  const start = performance.now();
  while (performance.now() - start < 75) { /* spin */ }
  return Math.random();
}

KPIs we follow

Here the KPI we had and checked (tbh we skip the queue edge KPI but this is a good one imho)

Edge TTFB (global p50/p95), by POP
Function CPU time and cold starts
Cache hit ratio and stampede rate
Queue age for revalidation jobs

Composition wins when latency‑sensistive work stays at the edge and compute‑heavy logic runs where it belongs. Keep the seam narrow, the traces clean, and the guardrails on.
Once the muscle memory forms, delivery velocity jumps.

Last updatedAugust 26, 2025