Full Cold-Start Walkthrough

A simulated session showing every phase. The scenario: a Node.js API with Express that has no rate limiting. Scrapers are hammering /search at 200 req/s.

Personas use non-human codenames by default — lenses, not characters. @owner [anchor] reads as a perspective; @owner [sarah] reads as a coworker. Each phase has one driver; supporters stay silent unless their interjection is both useful AND non-distortive.

/storytime "our public API has no rate limiting and scrapers are abusing /search at 200 req/s, we need per-client throttling that respects our auth tier system"

storytime v1.0.1

Phase 0 — Survey

Storytime explores the codebase. Finds the Express app, middleware chain, auth system, and existing monitoring. Driver: @owner [anchor].

Codebase survey complete.

Key files:
  src/server.ts:14           Express app setup, middleware chain
  src/middleware/auth.ts:1    JWT validation, tier extraction (free/pro/enterprise)
  src/routes/search.ts:23    The hot endpoint, no throttling
  src/monitoring/metrics.ts:8 Prometheus counters, no rate metrics

Artifact inventory:
  [x] docs/api-design.md           [spec] API conventions doc
  [ ] .cursor/rules                [config] (loaded silently)

No prior storytime state. Mode: storytime-native.
Coverage fingerprint written.
Creating specs/.storytime/
Phase 1 — Assemble

Cohort loaded from cohort/_roster.md. One specialist recruited for the security angle (@domain [warden]). Default core hits because the problem spans architecture, infra, and operations.

╔═══════════════════════════════════════════════════════╗
║  TEAM: rate-limiting                                 ║
╠═══════════════════════════════════════════════════════╣
║                                                     ║
║  @owner    [anchor]   API architect (cohort)        ║
║  @systems  [lattice]  Redis & scaling   (cohort)    ║
║  @operator [tide]     SRE, monitoring   (cohort)    ║
║  @domain   [warden]   Security, abuse   (specialist)║
║                                                     ║
╚═══════════════════════════════════════════════════════╝

warden is recruited just for this session with a scoped contract: review the abuse-prevention angle, release on DONE unless the work proves ongoing. The three cohort members carry their accumulated context from prior sessions.

Phase 2 — Icebreaker

Driver: @owner [anchor]. The other lenses speak in order to surface their read on the status quo.

@owner [anchor]: The middleware chain in src/server.ts:14-28 goes auth → logging → routes. There's no throttle layer anywhere. Every request hits the route handler regardless of volume. The /search endpoint at src/routes/search.ts:23 does a full Postgres text search per call — that's the expensive path.

@systems [lattice]: We're already running Redis for sessions (src/config/redis.ts:5). Adding rate limit counters there is zero new infra. The question is algorithm: fixed window is simplest but has the burst-at-boundary problem. Sliding window is smoother but needs more storage.

@domain [warden]: The auth middleware at src/middleware/auth.ts:32 already extracts the tier (free/pro/enterprise) from the JWT. Rate limits should key on that. But I want to flag: if we return 429, we need to make sure the response doesn't leak auth state. Some implementations echo back the rate limit key, which can be an info disclosure vector.

@operator [tide]: Two things. One, I need a kill switch — an env var that disables rate limiting entirely if it's causing false positives in prod. Two, I need per-route metrics in Prometheus: request count, 429 count, and current utilization ratio. Without that, I'm flying blind on whether the limits are right.

Sub-problems identified:

  1. Algorithm choice: token bucket vs sliding window vs fixed window
  2. Storage backend: Redis counters vs in-memory vs hybrid
  3. Middleware integration: where in the chain, how it interacts with auth
Phase 3 — Breakouts (parallel)

Three breakouts launched in parallel as sub-agents. Each has one driver and silent supporters that speak only on a trigger condition (ungrounded claim, factual error, missing constraint, scope drift).

Breakout: algorithm

Driver: @owner [anchor] · Supporters: @domain [warden] (spoke once: flagged the response-header info-leak concern)

Investigated token bucket, sliding window log, and fixed window counter. Sliding window log is the best fit: precise, no burst-at-boundary, and the memory cost is bounded by the window size. Token bucket is overkill for HTTP APIs (better for streaming). Fixed window is too coarse for our tier system.

Recommendation: Sliding window with 60s window. Per-key sorted set in Redis.
Confidence: high | Complexity: 3 | Scale: 2 (module)

Breakout: storage

Driver: @systems [lattice] · Supporters: @operator [tide] (stayed silent — driver's lens covered the territory cleanly)

Redis is already in the stack (src/config/redis.ts:5). In-memory would be faster but doesn't survive restarts and won't work with multiple instances. Redis sorted sets with ZRANGEBYSCORE give us the sliding window primitive natively. TTL on keys handles cleanup. Tested: 0.3ms per rate check at 10k concurrent keys.

Recommendation: Redis sorted sets with automatic TTL expiry.
Confidence: high | Complexity: 2 | Scale: 2 (module)

Breakout: middleware

Driver: @owner [anchor] · Supporters: @operator [tide] (spoke once: insisted on the kill switch as a non-negotiable)

The rate limiter must sit after auth (needs the tier and client ID from the JWT) but before route handlers. At src/server.ts:22, between the auth middleware and the router mount. @domain [warden]'s concern about 429 info leakage: the response should include Retry-After and X-RateLimit-* headers but NOT the rate limit key. Kill switch via RATE_LIMIT_ENABLED env var (default: true).

Recommendation: New middleware at src/middleware/rate-limit.ts, mounted after auth.
Confidence: high | Complexity: 3 | Scale: 2 (module)

Post-Breakout Pause

Mandatory pause — storytime does not auto-proceed to CONVERGE. You see what each breakout proposed AND what it didn't choose, then direct.

Breakout results:

1. [algorithm]   Sliding window, 60s, Redis sorted sets
   Driver: @owner [anchor] | Confidence: high | C 3 | S 2
   Options considered: token bucket (overkill), fixed window
   (burst-at-boundary), sliding window (chosen)
   → breakout-algo.md

2. [storage]    Redis sorted sets with TTL expiry
   Driver: @systems [lattice] | Confidence: high | C 2 | S 2
   Options considered: in-memory (doesn't survive restart),
   hybrid (added complexity for no win), Redis sorted (chosen)
   → breakout-store.md

3. [middleware] After auth, before routes, kill switch
   Driver: @owner [anchor] | Confidence: high | C 3 | S 2
   Options considered: pre-auth (no tier info), per-route
   wrapper (verbose), global middleware (chosen)
   → breakout-middleware.md

Conflicts: none
Open questions: tier-specific limit numbers (TBD in CONVERGE)

Ready to converge? [proceed / dig into N / revise N / add breakout]

User: proceed

Phase 4 — Converge + Plan

Driver: @owner [anchor] synthesizes. Each plan section gets the lens closest to it. Decisions get IDs (RATE-001..RATE-004) that pin to commits and become queryable graph nodes.

╔═══════════════════════════════════════════════════════╗
║  SLIDE 1: Problem                                   ║
╠═══════════════════════════════════════════════════════╣
║                                                     ║
║  request ──▶ [auth] ──▶ [???] ──▶ [routes] ──▶ db  ║
║                     ▲                               ║
║                     │                               ║
║               no throttle                            ║
║               layer here                             ║
║                                                     ║
║  /search hit 200 req/s from scrapers.               ║
║  Each request runs full-text Postgres query.        ║
║  No per-client limits. No abuse detection.          ║
║                                                     ║
╚═══════════════════════════════════════════════════════╝

╔═══════════════════════════════════════════════════════╗
║  SLIDE 2: Solution                                  ║
╠══════════════════════════════════════════════════════╣
║                                                     ║
║  request ─▶ [auth] ─▶ [rate-limit] ─▶ [routes]     ║
║                        │    │                       ║
║                    ┌───┘    └───┐                   ║
║                    ▼              ▼                   ║
║                 [Redis]       [429 + headers]        ║
║                 sorted        Retry-After            ║
║                 sets          X-RateLimit-*          ║
║                                                     ║
║  Tier limits (per 60s window):                      ║
║    free: 30    pro: 300    enterprise: 3000         ║
║                                                     ║
╚═══════════════════════════════════════════════════════╝

Decisions (sealed, append-only, commit-pinned):

Implementation steps:

  1. Create src/middleware/rate-limit.ts with sliding window logic (RATE-001, RATE-002)
  2. Add tier config to src/config/rate-limits.ts: free 30, pro 300, enterprise 3000 (RATE-003)
  3. Mount middleware in src/server.ts between auth and routes (RATE-003)
  4. Add Prometheus counters to src/monitoring/metrics.ts (RATE-004)
  5. Add Retry-After and X-RateLimit-* response headers (RATE-003)
  6. Add RATE_LIMIT_ENABLED env var check (RATE-004)
  7. Write integration tests against Redis (all decisions)

Non-goals (REQUIRED — each with why-skip + when-revisit):

Success criteria (REQUIRED — measurable):

Phase 5 — Review

User (@user [creator]): Why not just use express-rate-limit?

@owner [anchor]: express-rate-limit uses fixed window by default and its sliding window option uses in-memory storage. Per @systems [lattice]'s breakout finding, in-memory doesn't survive restarts and won't work across multiple instances. We could use rate-limit-redis as the store, but at that point we're writing the same Redis sorted set logic with extra dependency overhead. The custom middleware is ~60 lines and gives us full control over the header format and @domain [warden]'s security requirements.

User: Fair. Approved.

Phase 6 — Done
Session complete: rate-limiting (episode 001)

Output:
  specs/.storytime/sessions/rate-limiting/001/
  ├── survey.md
  ├── team.md
  ├── icebreaker.md
  ├── breakout-algo.md
  ├── breakout-store.md
  ├── breakout-middleware.md
  └── plan.md

Cohort updated (acquired context appended):
  @owner    [anchor]   +1 session  (now 4 total)
  @systems  [lattice]  +1 session  (now 3 total)
  @operator [tide]     +1 session  (now 5 total)
  @domain   [warden]   specialist  (contract complete; consider promotion)

Decisions logged: RATE-001..RATE-004 (commit-pinned, queryable)
Thread created: sessions/rate-limiting/_thread.md
Continuity ledger updated.

Next: /storytime-buildout rate-limiting    (implement the plan)
      /storytime-retro rate-limiting       (after deploy, evaluate)
What this leaves behind — the continuity surface

The session is over. What persists, and how does it survive context resets?

The thread (sessions/rate-limiting/_thread.md) is the continuity ledger. It carries decisions, episode log, open questions, and last commit. When you come back later (next day, next week, after /compact), /storytime rate-limiting re-loads the team's accumulated context. Personas skip introductions and pick up where the lens last left off.

Decisions form a typed DAG. Each RATE-NNN entry pins to the commit that sealed it. Cross-topic references use Callout-> and Callout<- sigils so a decision in auth-refactor can reference one in rate-limiting without copying content. Query the graph:

./scripts/intent-graph-query.sh get_orphans
./scripts/intent-graph-query.sh get_unrealized       # sealed but no commit
./scripts/intent-graph-query.sh get_path RATE-003    # walk to root
./scripts/intent-adherence.sh rate-limiting          # sealed vs realized grid

Remembrance survives /compact. When the model self-detects degradation (token budget, rut, framing loss), /storytime-remember stages a workday-shaped wakeup document. Post-compact, the first action of any storytime skill loads it. The team you assembled today is the team you'll have when you come back, with the same context, the same decisions, the same open questions.

Buildout is the next step. /storytime-buildout rate-limiting takes the plan and implements it — one driver per slice, traceability from each commit back to the deciding persona. /storytime-retro rate-limiting reconvenes the team later to compare plan-vs-built with an evaluation scorecard.