Traditional observability tools assume you have long-running servers. They install agents, collect metrics over time, and track process state. Serverless functions break all of these assumptions.

What’s different about serverless

No persistent process — each invocation is isolated. There’s no “server CPU” to track. The function starts, runs, and stops.

Cold starts are spikes, not failures — a 300ms cold start isn’t an error, it’s expected behavior. Your alerting needs to know the difference.

Correlation is harder — a user request might fan out across 10 functions. Each has its own logs, with no shared process ID to correlate them.

Billing is per-invocation — you need to know which functions are expensive, not just slow.

The three pillars (adapted for serverless)

Logs — the most important. Every invocation should emit a structured log with:

Request ID (for correlation)
Function name / service
Duration
Status (success/error)
Custom business fields

Traces — connecting related invocations. A traceId passed through all function calls lets you reconstruct a user’s full journey.

Metrics — derived from logs, not collected separately. Error rate, p99 latency, and invocation count can all be computed from log events.

Practical setup for Cloudflare Workers

export default {
  async fetch(request, env, ctx) {
    const traceId = request.headers.get('x-trace-id') ?? crypto.randomUUID();
    const start = Date.now();

    try {
      const response = await handleRequest(request, env, traceId);

      ctx.waitUntil(sendLog(env, {
        level: 'info',
        traceId,
        duration: Date.now() - start,
        status: response.status,
      }));

      return response;
    } catch (err) {
      ctx.waitUntil(sendLog(env, {
        level: 'error',
        traceId,
        duration: Date.now() - start,
        error: err.message,
      }));
      throw err;
    }
  }
};

What to measure

Focus on these four signals (the “RED” method adapted for serverless):

Rate — invocations per minute
Errors — error rate and error messages
Duration — p50, p95, p99 latency
Cost — which functions consume the most CPU time

Tooling that understands serverless

Most APM tools were designed before serverless existed. They work, but you’ll fight their assumptions constantly (agents, process metrics, persistent connections).

Tools built for serverless — like ScryWatch — store logs from the invocation boundary, correlate by traceId, and compute metrics from log events rather than process state. This matches how serverless actually works.

Serverless Observability Explained

What’s different about serverless

The three pillars (adapted for serverless)

Practical setup for Cloudflare Workers

What to measure

Tooling that understands serverless