Skip to content

Simulate API failures locally with Wraith

A twin that always returns 200s at 0ms is a liar. Real APIs rate-limit, time out, return 500s under load, and take 300ms when they feel like it. wraith serve can simulate all of this - deterministically, per route, with a shared RNG seed so the same seed produces the same fault sequence every time.

There are three simulation layers plus a trace endpoint for observability:

incoming request
-> rate-limit (real-behaviour baseline from recordings or CLI overrides)
-> fault (injected faults from a profile or chaos seed)
-> latency (simulated response timing)
-> dispatch (normal twin response)
-> trace (optional ring buffer for inspection)

All four are opt-in. With no flags set, wraith serve has zero simulation overhead - no allocation, no locking, no wrapper calls.

Six fault types, selected per rule:

TypeBehaviour
errorReturn a specific status + body
delaySleep [min_ms, max_ms] then serve the normal response
timeoutSleep hold_ms, then return 504 (connection is not severed)
dropReturn 499 with Connection: close and an empty body
throttleReturn 429 with X-RateLimit-* + Retry-After headers
partialRun normal dispatch, then truncate the response body to N bytes
Terminal window
wraith serve myapi --fault-profile faults.toml
faults.toml
name = "staging-chaos"
# 10% of all requests fail with 500
[[rules]]
route = "*"
percentage = 10
[rules.fault_type]
type = "error"
status = 500
body = '{"error":"simulated failure"}'
# 5% of writes slow to 500-1500ms
[[rules]]
route = "POST /v1/*"
percentage = 5
[rules.fault_type]
type = "delay"
min_ms = 500
max_ms = 1500
# Always throttle /v1/reports with 10 req/60s, but only for requests with a specific header
[[rules]]
route = "GET /v1/reports"
percentage = 100
header_match = ["X-Tenant", "demo"]
[rules.fault_type]
type = "throttle"
limit = 10
window_secs = 60
# First 3 calls to /v1/bootstrap fail with 503, then stop
[[rules]]
route = "POST /v1/bootstrap"
percentage = 100
count = 3
[rules.fault_type]
type = "error"
status = 503
body = '{"error":"service warming up"}'

Rules are evaluated in order; the first match wins. route supports * (match all) and prefix glob (/v1/*). header_match requires a case-insensitive key + exact value. count caps the total number of times a rule triggers; omit for unlimited.

Terminal window
wraith serve myapi --chaos-seed 42

This calls generate_chaos_profile(seed, routes) against the loaded WIR and produces a deterministic mix: ~30% of routes get a random error at 5-15%, ~25% get a delay at 10-30%, ~15% get throttling, ~5% get timeouts. Same seed + same twin = same profile, every time.

The fault layer uses a seeded ChaCha8Rng for the percentage rolls and delay bounds. To reproduce a failure sequence:

Terminal window
wraith serve myapi --fault-profile faults.toml --fault-seed 17

Every request (in order) against the same twin will roll identically.

Six modes, picked with --latency-mode:

ModeFlags
none(default - no latency added)
fixed--latency-ms <ms>
uniform--latency-min-ms <ms> --latency-max-ms <ms>
percentile--latency-p50 <ms> --latency-p95 <ms> --latency-p99 <ms>
recorded--latency-config <path> (reads arrays from TOML)
normal--latency-config <path> (mean + stddev from TOML)

The recorded and normal modes need a config file because their inputs don’t fit cleanly on the command line:

latency-config.toml
[default]
mode = "normal"
mean_ms = 120
stddev_ms = 40
# Override per route
[[routes]]
route = "GET /v1/customers/:id"
mode = "fixed"
ms = 30
[[routes]]
route = "GET /v1/reports/*"
mode = "recorded"
latencies = [180, 220, 340, 410, 280, 190, 510]
Terminal window
wraith serve myapi --latency-mode uniform --latency-min-ms 50 --latency-max-ms 200 --latency-seed 7

Same seed + same request sequence produces the same latency vector (within OS scheduling noise).

If a fault rule triggers Delay{ms} for a request, that delay replaces the latency simulator’s contribution for that request. No compounding, no double-sleep.

Two modes of rate limiting that share the same engine and response shape:

Your recorded API may have rate limits in its responses. Once synth-side detection lands (planned for a future release), those limits will be populated into the twin’s WIR automatically. For now, you can pre-populate rate_limit on routes manually or use CLI overrides:

Terminal window
wraith serve myapi --rate-limit
Terminal window
wraith serve myapi \
--rate-limit \
--rate-limit-override "GET /v1/users=100/60" \
--rate-limit-override "POST /v1/charges=10/60"

The "METHOD /path=N/Wsec" syntax sets a limit of N requests per W seconds for that route. Overrides add to or replace WIR entries.

When the limit is exceeded:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1729627200
Retry-After: 42
Content-Type: application/json
{"error":"rate limited"}

The exact same response shape is produced by fault_type = "throttle" - they share a single builder (runtime/http_util.rs).

  • FixedWindow (default): counter resets at window boundaries. Simple, bursty.
  • SlidingWindow: tracks request timestamps and returns 429 as soon as the recent-window count exceeds the limit. Smoother, slightly more expensive.

Select via the WIR entry or the override config.

When you want to see exactly what the twin served during a test run:

Terminal window
wraith serve myapi --trace --trace-capacity 5000

Three endpoints are then mounted on the twin:

GET /__wraith/trace/log # reverse-chronological ring buffer as JSON
GET /__wraith/trace/<id> # single trace by id
POST /__wraith/trace/reset # clear the buffer

Each trace records method, path + query, status, duration (us), timestamp, and session id (from x-wraith-session header, falling back to "default"). Capacity bounds the ring; older entries are evicted FIFO.

Trace endpoints are gated by the same auth policy as the rest of /__wraith/* when you bind to a non-loopback interface (see Configuration). They are off by default - no overhead when --trace is absent.

A realistic “staging API” twin that fails like a real one:

Terminal window
wraith serve myapi \
--port 8081 \
--rate-limit \
--rate-limit-override "POST /v1/charges=5/60" \
--fault-profile ./staging-faults.toml \
--fault-seed 1 \
--latency-mode percentile \
--latency-p50 80 --latency-p95 400 --latency-p99 1200 \
--latency-seed 1 \
--trace --trace-capacity 10000

Now your integration tests hit an API that:

  • Rate-limits bursts on POST /v1/charges
  • Injects the specific failure pattern from staging-faults.toml
  • Responds with realistic p50/p95/p99 latency
  • Records every request for post-test inspection
  • Is bit-reproducible given the seeds

That’s what you want your test suite to have been hitting all along.