A simulated session showing every phase. The scenario: a Node.js API with
Express that has no rate limiting. Scrapers are hammering /search at
200 req/s.
Personas use non-human codenames by default — lenses, not characters.
@owner [anchor] reads as a perspective; @owner [sarah] reads as a coworker.
Each phase has one driver; supporters stay silent unless their interjection
is both useful AND non-distortive.
/storytime "our public API has no rate limiting and scrapers are abusing /search at 200 req/s, we need per-client throttling that respects our auth tier system"
storytime v1.0.1
Storytime explores the codebase. Finds the Express app, middleware chain,
auth system, and existing monitoring. Driver: @owner [anchor].
Codebase survey complete.
Key files:
src/server.ts:14 Express app setup, middleware chain
src/middleware/auth.ts:1 JWT validation, tier extraction (free/pro/enterprise)
src/routes/search.ts:23 The hot endpoint, no throttling
src/monitoring/metrics.ts:8 Prometheus counters, no rate metrics
Artifact inventory:
[x] docs/api-design.md [spec] API conventions doc
[ ] .cursor/rules [config] (loaded silently)
No prior storytime state. Mode: storytime-native.
Coverage fingerprint written.
Creating specs/.storytime/
Cohort loaded from cohort/_roster.md. One specialist recruited for the
security angle (@domain [warden]). Default core hits because the problem
spans architecture, infra, and operations.
╔═══════════════════════════════════════════════════════╗
║ TEAM: rate-limiting ║
╠═══════════════════════════════════════════════════════╣
║ ║
║ @owner [anchor] API architect (cohort) ║
║ @systems [lattice] Redis & scaling (cohort) ║
║ @operator [tide] SRE, monitoring (cohort) ║
║ @domain [warden] Security, abuse (specialist)║
║ ║
╚═══════════════════════════════════════════════════════╝
warden is recruited just for this session with a scoped contract: review the abuse-prevention angle, release on DONE unless the work proves ongoing. The three cohort members carry their accumulated context from prior sessions.
Driver: @owner [anchor]. The other lenses speak in order to surface their
read on the status quo.
@owner [anchor]: The middleware chain in src/server.ts:14-28
goes auth → logging → routes. There's no throttle layer anywhere. Every
request hits the route handler regardless of volume. The /search endpoint
at src/routes/search.ts:23 does a full Postgres text search per call —
that's the expensive path.
@systems [lattice]: We're already running Redis for
sessions (src/config/redis.ts:5). Adding rate limit counters there is zero new
infra. The question is algorithm: fixed window is simplest but has the
burst-at-boundary problem. Sliding window is smoother but needs more storage.
@domain [warden]: The auth middleware at
src/middleware/auth.ts:32 already extracts the tier (free/pro/enterprise) from
the JWT. Rate limits should key on that. But I want to flag: if we return 429,
we need to make sure the response doesn't leak auth state. Some implementations
echo back the rate limit key, which can be an info disclosure vector.
@operator [tide]: Two things. One, I need a kill switch — an env var that disables rate limiting entirely if it's causing false positives in prod. Two, I need per-route metrics in Prometheus: request count, 429 count, and current utilization ratio. Without that, I'm flying blind on whether the limits are right.
Sub-problems identified:
Three breakouts launched in parallel as sub-agents. Each has one driver and silent supporters that speak only on a trigger condition (ungrounded claim, factual error, missing constraint, scope drift).
Driver: @owner [anchor] · Supporters: @domain [warden]
(spoke once: flagged the response-header info-leak concern)
Investigated token bucket, sliding window log, and fixed window counter. Sliding window log is the best fit: precise, no burst-at-boundary, and the memory cost is bounded by the window size. Token bucket is overkill for HTTP APIs (better for streaming). Fixed window is too coarse for our tier system.
Recommendation: Sliding window with 60s window. Per-key sorted set in Redis.
Confidence: high | Complexity: 3 | Scale: 2 (module)
Driver: @systems [lattice] · Supporters: @operator [tide]
(stayed silent — driver's lens covered the territory cleanly)
Redis is already in the stack (src/config/redis.ts:5). In-memory would be
faster but doesn't survive restarts and won't work with multiple instances.
Redis sorted sets with ZRANGEBYSCORE give us the sliding window primitive natively.
TTL on keys handles cleanup. Tested: 0.3ms per rate check at 10k concurrent keys.
Recommendation: Redis sorted sets with automatic TTL expiry.
Confidence: high | Complexity: 2 | Scale: 2 (module)
Driver: @owner [anchor] · Supporters: @operator [tide]
(spoke once: insisted on the kill switch as a non-negotiable)
The rate limiter must sit after auth (needs the tier and client ID from the JWT)
but before route handlers. At src/server.ts:22, between the auth middleware
and the router mount. @domain [warden]'s concern about 429 info leakage: the
response should include Retry-After and X-RateLimit-* headers but NOT the
rate limit key. Kill switch via RATE_LIMIT_ENABLED env var (default: true).
Recommendation: New middleware at src/middleware/rate-limit.ts, mounted after auth.
Confidence: high | Complexity: 3 | Scale: 2 (module)
Mandatory pause — storytime does not auto-proceed to CONVERGE. You see what each breakout proposed AND what it didn't choose, then direct.
Breakout results:
1. [algorithm] Sliding window, 60s, Redis sorted sets
Driver: @owner [anchor] | Confidence: high | C 3 | S 2
Options considered: token bucket (overkill), fixed window
(burst-at-boundary), sliding window (chosen)
→ breakout-algo.md
2. [storage] Redis sorted sets with TTL expiry
Driver: @systems [lattice] | Confidence: high | C 2 | S 2
Options considered: in-memory (doesn't survive restart),
hybrid (added complexity for no win), Redis sorted (chosen)
→ breakout-store.md
3. [middleware] After auth, before routes, kill switch
Driver: @owner [anchor] | Confidence: high | C 3 | S 2
Options considered: pre-auth (no tier info), per-route
wrapper (verbose), global middleware (chosen)
→ breakout-middleware.md
Conflicts: none
Open questions: tier-specific limit numbers (TBD in CONVERGE)
Ready to converge? [proceed / dig into N / revise N / add breakout]
User: proceed
Driver: @owner [anchor] synthesizes. Each plan section gets the lens closest
to it. Decisions get IDs (RATE-001..RATE-004) that pin to commits and
become queryable graph nodes.
╔═══════════════════════════════════════════════════════╗
║ SLIDE 1: Problem ║
╠═══════════════════════════════════════════════════════╣
║ ║
║ request ──▶ [auth] ──▶ [???] ──▶ [routes] ──▶ db ║
║ ▲ ║
║ │ ║
║ no throttle ║
║ layer here ║
║ ║
║ /search hit 200 req/s from scrapers. ║
║ Each request runs full-text Postgres query. ║
║ No per-client limits. No abuse detection. ║
║ ║
╚═══════════════════════════════════════════════════════╝
╔═══════════════════════════════════════════════════════╗
║ SLIDE 2: Solution ║
╠══════════════════════════════════════════════════════╣
║ ║
║ request ─▶ [auth] ─▶ [rate-limit] ─▶ [routes] ║
║ │ │ ║
║ ┌───┘ └───┐ ║
║ ▼ ▼ ║
║ [Redis] [429 + headers] ║
║ sorted Retry-After ║
║ sets X-RateLimit-* ║
║ ║
║ Tier limits (per 60s window): ║
║ free: 30 pro: 300 enterprise: 3000 ║
║ ║
╚═══════════════════════════════════════════════════════╝
Decisions (sealed, append-only, commit-pinned):
@owner [anchor]. Token bucket rejected (overkill for HTTP). Fixed window rejected (burst-at-boundary).@systems [lattice]. In-memory rejected (doesn't survive restarts, no multi-instance). Existing Redis at src/config/redis.ts:5.src/server.ts:22. Key on clientId + tier from JWT. Drivers: @owner [anchor].RATE_LIMIT_ENABLED env var kill switch (default: true). Per-route Prometheus metrics. Drivers: @operator [tide].Implementation steps:
src/middleware/rate-limit.ts with sliding window logic (RATE-001, RATE-002)src/config/rate-limits.ts: free 30, pro 300, enterprise 3000 (RATE-003)src/server.ts between auth and routes (RATE-003)src/monitoring/metrics.ts (RATE-004)Retry-After and X-RateLimit-* response headers (RATE-003)RATE_LIMIT_ENABLED env var check (RATE-004)Non-goals (REQUIRED — each with why-skip + when-revisit):
Success criteria (REQUIRED — measurable):
User (@user [creator]): Why not just use express-rate-limit?
@owner [anchor]: express-rate-limit uses fixed window
by default and its sliding window option uses in-memory storage. Per
@systems [lattice]'s breakout finding, in-memory doesn't survive restarts
and won't work across multiple instances. We could use rate-limit-redis as
the store, but at that point we're writing the same Redis sorted set logic with
extra dependency overhead. The custom middleware is ~60 lines and gives us full
control over the header format and @domain [warden]'s security requirements.
User: Fair. Approved.
Session complete: rate-limiting (episode 001)
Output:
specs/.storytime/sessions/rate-limiting/001/
├── survey.md
├── team.md
├── icebreaker.md
├── breakout-algo.md
├── breakout-store.md
├── breakout-middleware.md
└── plan.md
Cohort updated (acquired context appended):
@owner [anchor] +1 session (now 4 total)
@systems [lattice] +1 session (now 3 total)
@operator [tide] +1 session (now 5 total)
@domain [warden] specialist (contract complete; consider promotion)
Decisions logged: RATE-001..RATE-004 (commit-pinned, queryable)
Thread created: sessions/rate-limiting/_thread.md
Continuity ledger updated.
Next: /storytime-buildout rate-limiting (implement the plan)
/storytime-retro rate-limiting (after deploy, evaluate)
The session is over. What persists, and how does it survive context resets?
The thread (sessions/rate-limiting/_thread.md) is the continuity ledger.
It carries decisions, episode log, open questions, and last commit. When you
come back later (next day, next week, after /compact), /storytime rate-limiting
re-loads the team's accumulated context. Personas skip introductions and pick up
where the lens last left off.
Decisions form a typed DAG. Each RATE-NNN entry pins to the commit that
sealed it. Cross-topic references use Callout-> and Callout<- sigils so a
decision in auth-refactor can reference one in rate-limiting without copying
content. Query the graph:
./scripts/intent-graph-query.sh get_orphans
./scripts/intent-graph-query.sh get_unrealized # sealed but no commit
./scripts/intent-graph-query.sh get_path RATE-003 # walk to root
./scripts/intent-adherence.sh rate-limiting # sealed vs realized grid
Remembrance survives /compact. When the model self-detects degradation
(token budget, rut, framing loss), /storytime-remember stages a workday-shaped
wakeup document. Post-compact, the first action of any storytime skill loads it.
The team you assembled today is the team you'll have when you come back, with the
same context, the same decisions, the same open questions.
Buildout is the next step. /storytime-buildout rate-limiting takes the
plan and implements it — one driver per slice, traceability from each commit
back to the deciding persona. /storytime-retro rate-limiting reconvenes the team
later to compare plan-vs-built with an evaluation scorecard.