architectureloggingserverless

Building a Lightweight Log Monitoring Stack

Most log monitoring stacks are overbuilt. They start with Kafka, Elasticsearch, and a dedicated ops team before the app has 100 users. Here’s how to design something lightweight that actually scales.

The three layers

Every log system needs:

  1. Ingestion — receive and store events durably
  2. Indexing — make logs queryable
  3. Alerting — notify you when something goes wrong

The key insight: you don’t need all three to be complex at once.

Phase 1: Ship something (0–10k events/day)

A single Cloudflare Worker can handle ingestion and write directly to D1 (SQLite):

CREATE TABLE logs (
  id TEXT PRIMARY KEY,
  level TEXT,
  message TEXT,
  service TEXT,
  timestamp INTEGER,
  fields TEXT -- JSON blob for custom fields
);
CREATE INDEX logs_timestamp ON logs(timestamp);
CREATE INDEX logs_level ON logs(level);

This handles ~10M rows before you need to think about partitioning. For a startup, this is years of runway.

Phase 2: Add search (10k–1M events/day)

Add a full-text search index:

CREATE VIRTUAL TABLE logs_fts USING fts5(message, fields, content=logs);

D1’s FTS5 support lets you query across messages and JSON fields without a dedicated search service.

Phase 3: Archive cold data (1M+ events/day)

Move old data to R2 (Cloudflare’s S3-compatible storage) at $0.015/GB/month. Keep 7 days hot in D1, archive the rest. Use a nightly cron Worker to move data:

// Archive logs older than 7 days to R2
const old = await db.prepare(
  'SELECT * FROM logs WHERE timestamp < ?'
).bind(Date.now() - 7 * 86400 * 1000).all();

await r2.put(`archive/${date}.ndjson`, toNDJSON(old));
await db.prepare('DELETE FROM logs WHERE timestamp < ?')
  .bind(Date.now() - 7 * 86400 * 1000).run();

What to avoid early on

The ScryWatch approach

ScryWatch uses exactly this architecture: D1 for hot storage, R2 for archives, Durable Objects for real-time usage counting, and Workers for ingestion. The entire stack runs on Cloudflare with zero infrastructure to manage.

This keeps costs low ($9/month for most teams) and performance high (ingestion at the edge, <10ms globally).

← All posts