Most log monitoring stacks are overbuilt. They start with Kafka, Elasticsearch, and a dedicated ops team before the app has 100 users. Here’s how to design something lightweight that actually scales.
The three layers
Every log system needs:
- Ingestion — receive and store events durably
- Indexing — make logs queryable
- Alerting — notify you when something goes wrong
The key insight: you don’t need all three to be complex at once.
Phase 1: Ship something (0–10k events/day)
A single Cloudflare Worker can handle ingestion and write directly to D1 (SQLite):
CREATE TABLE logs (
id TEXT PRIMARY KEY,
level TEXT,
message TEXT,
service TEXT,
timestamp INTEGER,
fields TEXT -- JSON blob for custom fields
);
CREATE INDEX logs_timestamp ON logs(timestamp);
CREATE INDEX logs_level ON logs(level);
This handles ~10M rows before you need to think about partitioning. For a startup, this is years of runway.
Phase 2: Add search (10k–1M events/day)
Add a full-text search index:
CREATE VIRTUAL TABLE logs_fts USING fts5(message, fields, content=logs);
D1’s FTS5 support lets you query across messages and JSON fields without a dedicated search service.
Phase 3: Archive cold data (1M+ events/day)
Move old data to R2 (Cloudflare’s S3-compatible storage) at $0.015/GB/month. Keep 7 days hot in D1, archive the rest. Use a nightly cron Worker to move data:
// Archive logs older than 7 days to R2
const old = await db.prepare(
'SELECT * FROM logs WHERE timestamp < ?'
).bind(Date.now() - 7 * 86400 * 1000).all();
await r2.put(`archive/${date}.ndjson`, toNDJSON(old));
await db.prepare('DELETE FROM logs WHERE timestamp < ?')
.bind(Date.now() - 7 * 86400 * 1000).run();
What to avoid early on
- Kafka — overkill until you’re at 100M events/day
- Elasticsearch — expensive and complex to operate
- Multiple databases — one SQLite database handles more than you think
- Custom metrics pipeline — logs-as-metrics is usually sufficient early
The ScryWatch approach
ScryWatch uses exactly this architecture: D1 for hot storage, R2 for archives, Durable Objects for real-time usage counting, and Workers for ingestion. The entire stack runs on Cloudflare with zero infrastructure to manage.
This keeps costs low ($9/month for most teams) and performance high (ingestion at the edge, <10ms globally).