observabilityserverlessconcepts

Serverless Observability Explained

Traditional observability tools assume you have long-running servers. They install agents, collect metrics over time, and track process state. Serverless functions break all of these assumptions.

What’s different about serverless

No persistent process — each invocation is isolated. There’s no “server CPU” to track. The function starts, runs, and stops.

Cold starts are spikes, not failures — a 300ms cold start isn’t an error, it’s expected behavior. Your alerting needs to know the difference.

Correlation is harder — a user request might fan out across 10 functions. Each has its own logs, with no shared process ID to correlate them.

Billing is per-invocation — you need to know which functions are expensive, not just slow.

The three pillars (adapted for serverless)

Logs — the most important. Every invocation should emit a structured log with:

Traces — connecting related invocations. A traceId passed through all function calls lets you reconstruct a user’s full journey.

Metrics — derived from logs, not collected separately. Error rate, p99 latency, and invocation count can all be computed from log events.

Practical setup for Cloudflare Workers

export default {
  async fetch(request, env, ctx) {
    const traceId = request.headers.get('x-trace-id') ?? crypto.randomUUID();
    const start = Date.now();

    try {
      const response = await handleRequest(request, env, traceId);

      ctx.waitUntil(sendLog(env, {
        level: 'info',
        traceId,
        duration: Date.now() - start,
        status: response.status,
      }));

      return response;
    } catch (err) {
      ctx.waitUntil(sendLog(env, {
        level: 'error',
        traceId,
        duration: Date.now() - start,
        error: err.message,
      }));
      throw err;
    }
  }
};

What to measure

Focus on these four signals (the “RED” method adapted for serverless):

Tooling that understands serverless

Most APM tools were designed before serverless existed. They work, but you’ll fight their assumptions constantly (agents, process metrics, persistent connections).

Tools built for serverless — like ScryWatch — store logs from the invocation boundary, correlate by traceId, and compute metrics from log events rather than process state. This matches how serverless actually works.

← All posts