Simulate API failures locally with Wraith
A twin that always returns 200s at 0ms is a liar. Real APIs rate-limit, time out, return 500s under load, and take 300ms when they feel like it. wraith serve can simulate all of this - deterministically, per route, with a shared RNG seed so the same seed produces the same fault sequence every time.
There are three simulation layers plus a trace endpoint for observability:
incoming request -> rate-limit (real-behaviour baseline from recordings or CLI overrides) -> fault (injected faults from a profile or chaos seed) -> latency (simulated response timing) -> dispatch (normal twin response) -> trace (optional ring buffer for inspection)All four are opt-in. With no flags set, wraith serve has zero simulation overhead - no allocation, no locking, no wrapper calls.
Fault injection
Section titled “Fault injection”Six fault types, selected per rule:
| Type | Behaviour |
|---|---|
error | Return a specific status + body |
delay | Sleep [min_ms, max_ms] then serve the normal response |
timeout | Sleep hold_ms, then return 504 (connection is not severed) |
drop | Return 499 with Connection: close and an empty body |
throttle | Return 429 with X-RateLimit-* + Retry-After headers |
partial | Run normal dispatch, then truncate the response body to N bytes |
From a profile file
Section titled “From a profile file”wraith serve myapi --fault-profile faults.tomlname = "staging-chaos"
# 10% of all requests fail with 500[[rules]]route = "*"percentage = 10[rules.fault_type]type = "error"status = 500body = '{"error":"simulated failure"}'
# 5% of writes slow to 500-1500ms[[rules]]route = "POST /v1/*"percentage = 5[rules.fault_type]type = "delay"min_ms = 500max_ms = 1500
# Always throttle /v1/reports with 10 req/60s, but only for requests with a specific header[[rules]]route = "GET /v1/reports"percentage = 100header_match = ["X-Tenant", "demo"][rules.fault_type]type = "throttle"limit = 10window_secs = 60
# First 3 calls to /v1/bootstrap fail with 503, then stop[[rules]]route = "POST /v1/bootstrap"percentage = 100count = 3[rules.fault_type]type = "error"status = 503body = '{"error":"service warming up"}'Rules are evaluated in order; the first match wins. route supports * (match all) and prefix glob (/v1/*). header_match requires a case-insensitive key + exact value. count caps the total number of times a rule triggers; omit for unlimited.
Or let wraith generate chaos
Section titled “Or let wraith generate chaos”wraith serve myapi --chaos-seed 42This calls generate_chaos_profile(seed, routes) against the loaded WIR and produces a deterministic mix: ~30% of routes get a random error at 5-15%, ~25% get a delay at 10-30%, ~15% get throttling, ~5% get timeouts. Same seed + same twin = same profile, every time.
Determinism
Section titled “Determinism”The fault layer uses a seeded ChaCha8Rng for the percentage rolls and delay bounds. To reproduce a failure sequence:
wraith serve myapi --fault-profile faults.toml --fault-seed 17Every request (in order) against the same twin will roll identically.
Latency simulation
Section titled “Latency simulation”Six modes, picked with --latency-mode:
| Mode | Flags |
|---|---|
none | (default - no latency added) |
fixed | --latency-ms <ms> |
uniform | --latency-min-ms <ms> --latency-max-ms <ms> |
percentile | --latency-p50 <ms> --latency-p95 <ms> --latency-p99 <ms> |
recorded | --latency-config <path> (reads arrays from TOML) |
normal | --latency-config <path> (mean + stddev from TOML) |
The recorded and normal modes need a config file because their inputs don’t fit cleanly on the command line:
[default]mode = "normal"mean_ms = 120stddev_ms = 40
# Override per route[[routes]]route = "GET /v1/customers/:id"mode = "fixed"ms = 30
[[routes]]route = "GET /v1/reports/*"mode = "recorded"latencies = [180, 220, 340, 410, 280, 190, 510]Determinism
Section titled “Determinism”wraith serve myapi --latency-mode uniform --latency-min-ms 50 --latency-max-ms 200 --latency-seed 7Same seed + same request sequence produces the same latency vector (within OS scheduling noise).
Interaction with fault injection
Section titled “Interaction with fault injection”If a fault rule triggers Delay{ms} for a request, that delay replaces the latency simulator’s contribution for that request. No compounding, no double-sleep.
Rate limiting
Section titled “Rate limiting”Two modes of rate limiting that share the same engine and response shape:
WIR-stored limits
Section titled “WIR-stored limits”Your recorded API may have rate limits in its responses. Once synth-side detection lands (planned for a future release), those limits will be populated into the twin’s WIR automatically. For now, you can pre-populate rate_limit on routes manually or use CLI overrides:
wraith serve myapi --rate-limitCLI overrides
Section titled “CLI overrides”wraith serve myapi \ --rate-limit \ --rate-limit-override "GET /v1/users=100/60" \ --rate-limit-override "POST /v1/charges=10/60"The "METHOD /path=N/Wsec" syntax sets a limit of N requests per W seconds for that route. Overrides add to or replace WIR entries.
Response shape
Section titled “Response shape”When the limit is exceeded:
HTTP/1.1 429 Too Many RequestsX-RateLimit-Limit: 100X-RateLimit-Remaining: 0X-RateLimit-Reset: 1729627200Retry-After: 42Content-Type: application/json
{"error":"rate limited"}The exact same response shape is produced by fault_type = "throttle" - they share a single builder (runtime/http_util.rs).
Algorithms
Section titled “Algorithms”- FixedWindow (default): counter resets at window boundaries. Simple, bursty.
- SlidingWindow: tracks request timestamps and returns 429 as soon as the recent-window count exceeds the limit. Smoother, slightly more expensive.
Select via the WIR entry or the override config.
Trace endpoints
Section titled “Trace endpoints”When you want to see exactly what the twin served during a test run:
wraith serve myapi --trace --trace-capacity 5000Three endpoints are then mounted on the twin:
GET /__wraith/trace/log # reverse-chronological ring buffer as JSONGET /__wraith/trace/<id> # single trace by idPOST /__wraith/trace/reset # clear the bufferEach trace records method, path + query, status, duration (us), timestamp, and session id (from x-wraith-session header, falling back to "default"). Capacity bounds the ring; older entries are evicted FIFO.
Trace endpoints are gated by the same auth policy as the rest of /__wraith/* when you bind to a non-loopback interface (see Configuration). They are off by default - no overhead when --trace is absent.
Putting it together
Section titled “Putting it together”A realistic “staging API” twin that fails like a real one:
wraith serve myapi \ --port 8081 \ --rate-limit \ --rate-limit-override "POST /v1/charges=5/60" \ --fault-profile ./staging-faults.toml \ --fault-seed 1 \ --latency-mode percentile \ --latency-p50 80 --latency-p95 400 --latency-p99 1200 \ --latency-seed 1 \ --trace --trace-capacity 10000Now your integration tests hit an API that:
- Rate-limits bursts on
POST /v1/charges - Injects the specific failure pattern from
staging-faults.toml - Responds with realistic p50/p95/p99 latency
- Records every request for post-test inspection
- Is bit-reproducible given the seeds
That’s what you want your test suite to have been hitting all along.