Skip to content

Wraith release notes and API twin conformance progress

Overlays. A consumer team can layer their own routes, variants, fixtures, and fault profiles onto a provider-owned base twin without forking the base, and ship that layer as its own .wraith artifact. Pre-existing root twins are completely unaffected — overlays are inert without a [base] section in wraith.toml.

  • You need a behavior the provider hasn’t recorded (a webhook replay path, an error scenario, a specific edge case in CI).
  • Your test environment needs different fixture data than the base ships.
  • You want to add fault injection or latency profiles without touching the shared twin.

If you’d otherwise vendor and edit a copy of someone else’s twin, you want an overlay.

Terminal window
wraith init checkout-billing --base billing-api@sha256:abc --owner checkout
wraith record checkout-billing --tag happy-path
wraith synth checkout-billing # --delta is the default
wraith compose --base billing-api.wraith \
--overlay checkout-billing.wraith \
--output composite
wraith serve composite

See Overlays for the full workflow, configuration reference, and v0 scope notes.

  • wraith compose — merge a base plus one or more overlays into a materialized composite twin (a workspace or .wraith archive). Deterministic: same inputs in the same order produce byte-identical outputs.
  • wraith rebase-check — when the base advances, classify whether your overlay still applies cleanly against the new digest without having to re-record. Emits compatible, additive-safe, or conflict with evidence.
  • wraith promote — gated publication of an overlay artifact. Requires policy pass plus evidence sufficiency. Evidence-light overlays can still be checked, but they can’t be promoted.
  • wraith init --base <ref> --owner <team> — initialize a twin as an overlay against a digest-pinned base.
  • wraith synth --delta | --full | --base-path <path>--delta (the default for overlay twins) synthesizes only the routes that diverge from the base; --full synthesizes the entire twin. Root twins always synth full.
  • wraith serve --overlay <ovl.wraith> [--keep-composite] [--fixture <name>] — convenience for “compose then serve” without writing a composite to disk first. --keep-composite retains the materialized workspace for debugging.
  • wraith check --fixture <name> — pick which overlay’s fixture set seeds the default namespace during conformance.
  • wraith pack --include-diagnostics — ship compose-phase diagnostics inside the packed archive’s reports/ tree.

Overlay policy uses the existing exit-code discipline:

  • 0 — composes cleanly.
  • 1 — user error (bad config, missing artifact).
  • 3 — policy violation (weaker scrub posture, base-route deletion, Lua handler shadowing, etc.).
  • 4 — runtime error during composition.
  • compose output is fully self-contained. Composite workspaces now carry the merged state/fixtures/ and recordings rather than referring back to the input artifacts.
  • compose rejects archive entries with traversal-shaped paths (.., absolute paths, symlinks pointing outside the input root). Defense-in-depth on the unpack stage.
  • Same inputs to compose produce byte-identical artifacts. Set SOURCE_DATE_EPOCH to pin timestamps further. Useful for CI that diffs .wraith archives.
  • wraith synth --delta writes build/delta-report.json with a per-route breakdown of covered_by_base / delta / unreplayable and structured advice (overlay-is-redundant, many-unreplayable, base-route-missing) — gives a clear signal about whether an overlay is doing anything new or whether you should re-record.
  • wraith lint catches overlay misconfigurations. Missing [overlay].owner, invalid base digest, mismatched capability flags, and overlay twins that try to enable passthrough are flagged with the same surface wraith doctor already used.

Patch. Search and query POST routes no longer mint phantom entities in state.

POSTs like POST /v1/assets/actions/search were being classified as resource Create operations, so every search call left a junk entity behind in the per-session state store. After v0.8.3 wired seeded fixtures through serve, this caused three visible problems: search-shaped fixture-name collisions, faster-than-expected exhaustion of serve.limits.max_entities_per_type, and state snapshots polluted with synthetic search responses.

Action POSTs are now detected by two signals — the last URL segment (search, query, count, aggregate, summarize, lookup, and the Stripe-style /actions/<verb> shape) and the response body shape (single-array bodies or paginated {results: [], next_cursor: …} shapes). Routes matching either signal dispatch without state mutation.

If you have a twin where this heuristic now applies (e.g. POST /v1/customers/search), re-run wraith synth to pick up the fix.

Patch. state/fixtures/ is now actually loaded at serve time.

The state/fixtures/<entity>.json shape has been documented since v0.1, but wraith serve never read those files — every state-backed Read or List started with an empty store regardless of what was on disk. This is now wired through end-to-end.

  • Per-session seeding. state/fixtures/<entity_type>.json is loaded once per X-Wraith-Session namespace on first use. A delete then persists for the rest of the session; no re-seeding mid-session.
  • Default namespace too. Requests without an X-Wraith-Session header still get seeded.
  • state/schema.json declarations merge into the route-derived schema. Route-derived wins on conflict, so an empty entity_types: {} (the wraith init default) is fully inert.
  • Fail-safe. Missing or malformed files warn-log and proceed rather than crash serve. A twin with no state/ directory behaves exactly as it did pre-v0.8.3.

Use case. Multi-twin demos and shared-entity test scenarios — e.g. customer cus_123 referenced consistently across a CRM twin, a billing twin, and an orders twin — can now be set up by authoring one fixture per twin rather than driving a POST sequence at the start of every session.

Heads up — outbound scrub still runs on seeded fixtures. A fixture entity with a name field will be tokenized on the wire by the default PII rules ("alpha""name_<base62>"). This is the v0.6.0 PII behavior, not a regression — but it surprises fixture authors. Workarounds: add the field to your [pii] allowlist in scrub.toml, or set [pii] detect = false for twins where seeded values aren’t real PII.

Patch. Closes the remaining $arr_N placeholder leak on List and Read routes.

v0.7.1 fixed Create dispatch, but nested array placeholders — e.g. {"data":{"items":["$arr_0"]}} or a sibling meta / facets array — still leaked through List and Read because those handlers only rewrote the top-level collection array. They’re now expanded everywhere variants surface a body, so no literal $arr_N markers reach the wire.

If your synthesized responses include nested arrays and you saw ["$arr_0"] in serve output before, upgrade and they’re gone. No re-synth required.

Patch. Completes the v0.8.0 CORS preflight fix under the default config.

v0.8.0 correctly synthesized access-control-allow-{methods,headers} and vary on the OPTIONS variant, but the default strip_headers = true config (created by wraith init) then stripped those exact headers on the way out because they weren’t in the response-header allowlist. The v0.8.0 fix was therefore inert for most twins.

access-control-allow-methods, access-control-allow-headers, access-control-max-age, and vary are now in the default allowlist. Cross-origin clients hitting a synth twin behave correctly under strip_headers = true. Conformance scoring is unaffected.

Feature release. Closes three rough edges that came up in real-corpus use: dropped CORS preflight headers, repetitive array elements, and routes whose response depends on a request field. Every new behavior is opt-in or a strict bugfix; pre-existing twins keep their current bytes unless you opt in.

wraith serve --fidelity synth returned a bare 204 for cross-origin OPTIONS preflights, dropping access-control-allow-{origin,methods,headers} and vary. Every browser request was therefore blocked at the preflight stage. The synthesized OPTIONS variant now carries the recorded CORS headers, body-less status groups (204 / 304) included. Strict-mode replay was already correct; synth now matches.

array_length = "p90" (v0.7.2) recovered a ~500-long array but anti-unification still capped the distinct elements at 8 and tiled them to length — list UIs showed 8 rows repeated ~62×.

[generate.anti_unification]
max_array_representatives = "all" # or a bound like 200

Default stays at 8 so existing twins are byte-unchanged. Catalog or search APIs whose recordings carry many distinct rows are the main beneficiaries.

Some routes return different bodies depending on a request field — a parent id, a useCase scope, a search filter. Without help, synth collapses every input to one global representative, and every variation in the request returns the same canned response. The new request-keying machinery synthesizes one response per request-field bucket and routes the right one back.

[generate.request_keying]
mode = "manual" # or "auto" for conservative auto-detection
[[generate.request_keying.route]]
route = "POST /v1/assets/actions/search"
fields = ["$.input.filter.parentId"]

Default is mode = "off", fully inert. Use manual to declare keys per-route, or auto to let synth try to detect a key for unruled routes when one strongly predicts the response.

For catalog / search-shaped APIs that combine bimodal arrays with request-keyed responses:

[generate.anti_unification]
array_length = "p90"
drop_empty_array_responses = true
max_array_representatives = "all"
[generate.request_keying]
mode = "manual"

Feature release. Adds two knobs so synth handles bimodal / search corpora correctly. Both default to pre-v0.7.2 behavior exactly — existing twins are byte-unchanged unless you opt in.

A debounced search endpoint records a flood of empty no-match responses interleaved with a few real catalog loads. Synth’s default median-length array policy then collapsed such routes to ~1-element arrays even though the data was right there in the recordings.

Two new knobs, both under [generate.anti_unification]:

  • array_length"median" (default), "p75", "p90", or "max". Pick the length statistic that matches your corpus shape.
  • drop_empty_array_responsesfalse (default). When true, all-empty responses are excluded from anti-unification per status group, but only when at least one non-empty response exists for that group, so error variants and scalar responses are never dropped.

wraith synth now prints the active policy in its fidelity warning and, on collapse-prone defaults, suggests the exact stanza to add.

Section titled “Recommended config for bimodal / search APIs”
[generate.anti_unification]
array_length = "p90" # or "max"
drop_empty_array_responses = true

Patch. Fixes a placeholder leak in synth-mode Create responses.

wraith serve --fidelity synth was returning literal ["$arr_0"] strings in POST responses for routes classified as Create whose variants used variable-length array placeholders. The same string was also being persisted into state, so subsequent Read / List requests for that entity kept emitting it indefinitely.

Fixed at write time — expanded entities go into state, and expanded bodies go to clients. Re-pack any twin whose recordings include Create routes with variable-length arrays to flush the bad state from earlier serve runs.

The related nested-placeholder leak on List / Read routes is fixed in v0.8.2.

wraith generate hardening release. Four review passes on generate alone surfaced 11 fixable bugs — budgets that didn’t enforce, audits that didn’t write, scores that disagreed with wraith check, rejection reasons that hid the real cause. All fixed. The agentic and single-shot loops are now trustworthy enough to drive in CI.

  • --time-budget cancels in-flight LLM calls. Was advisory — a stalled call ran until external SIGKILL. Now each provider call is wrapped against the run-level deadline; on expiry the in-flight HTTP future is dropped and the process exits within time_budget + 5s grace. Covers ollama, openai, openrouter, and command providers, in agentic and single-shot modes.
  • --token-budget enforced per-call. The LLM’s completion is capped at min(8192, tokens_remaining) so a single response can’t push wildly over budget. Prompt tokens are also accounted now — estimate_prompt_tokens() (chars/4) subtracts from the budget before max_tokens is computed, and the call is skipped entirely when the prompt alone would exceed the budget. Stripe-sized prompts (~28k tokens) overshoot dropped from ~22% to ~0%.
  • Generate’s score matches wraith check. Previously generate called the conformance engine with lua_dir=None, so on twins with Lua handlers (orderledger has 7) the engine returned 501s the diff engine saw as 233 phantom divergences. The Lua directory is now threaded through every call site; generate’s reported score equals wraith check --in-memory.
  • generate-audit-*.json written on every run. Previously the audit directory was empty after every run (wrong write path). A new RAII writer atomically rewrites the file at start, after each round, on success, on error, and on panic-unwind. Schema: timestamps, twin/provider/model, budgets, initial + final conformance, per-route patches with reasons, per-round agentic transcripts, token spend, exhaustion reason.
  • SIGKILL-safe audits. A new started exhaustion-reason marker is written at construction so SIGKILL’d runs leave a meaningful marker on disk — readers can distinguish “still running” from “completed cleanly” instead of seeing null.
  • Unified exhaustion_reason across envelope and audit. Was two separate enums with different precedence — the same run could report iterations in the envelope and budget_exhausted in the audit. Now a single enum with documented precedence (error > panic > killed > time_exhausted > budget_exhausted > iterations_exhausted > completed); the two surfaces always agree.
  • Token-vs-time precedence is honest. A pre-call gate previously set a generic budget_hit flag that mapped to time_exhausted always — so a token-budget run reported time_exhausted. A typed BudgetHitCause carries the specific cause and routes each variant to the right ExhaustionReason.
  • Real rejection reasons. Rejected patches no longer all report "no edits made". Each rejection site emits a specific rejection_reason: budget-exhausted | parse-failure | regression-rejected | empty-edits | protocol-failure | llm-error | user-declined.
  • --interactive now actually prompts. Was declared and documented but never read. Now: before applying each accepted patch, a unified diff of {status, headers, template} is printed to stderr followed by apply this patch? [y/N]:. y / yes accepts; anything else (including EOF / empty line) rejects with rejection_reason: user-declined. Stdout JSON envelope stays clean. Works in both agentic and --no-agentic modes.
  • Lib tests: 2890 → 2953 (+63 across the release).
  • 11 generate-related bones closed across 4 review passes; zero open bugs at cut.

Brutal-review shakedown. 14 review passes, 70+ fixes, zero open bugs at cut. New wire-mode conformance, new wraith install, principled PII machinery.

  • wraith install <pack.wraith> — inverse of wraith pack. Extracts a packaged twin into a usable workspace. Verifies per-artifact digests before writing any files. Defense-in-depth PII rescrub on extraction. --name, --into, --force, --no-verify, --rescrub.
  • wraith check --wire — wire-mode conformance. Spawns the real serve on a loopback port and replays recorded requests through it. Catches protocol-level bugs the in-memory check is blind to (header stripping, scrub layer mismatch, status code drift). Emits a separate wire_fidelity_bp score with the same partial-credit formula as the replay score.
  • wraith check --upstream without --target or --in-memory now defaults to in-memory replay (previously silently no-op’d). Emits info advice noting the implicit choice.
  • wraith reduce strategies are distinct. coverage uses greedy set cover; diversity uses farthest-point-first by Jaccard distance; recency ranks by timestamp. Invalid --target-size (e.g. abc, bare 50 without %) now exits non-zero with a hint instead of silently no-op’ing.
  • Error-severity divergences count against the score. wraith check no longer reports 100% conformance while emitting thousands of severity=error divergences. Any error-severity divergence on an exchange zeros the affected component score.
  • drift_type classifier refined. New numeric_drift, host_rewrite, url_drift, value_drift. enum_expansion reserved for real string-enum cases.
  • upstream_fidelity_bp — separate score answering “does the twin look like the live upstream right now?” Network failures degrade gracefully.
  • 404 on unknown IDs for Read endpoints when both 2xx and 4xx variants are present. GET /v1/customers/cus_FAKE → 404 instead of 200 with empty body.
  • POST /:id classified as Update, not Create. Matches Stripe convention. Sub-resource POSTs (/cancel, /capture) still classify as Action.
  • DELETE preserves pre-mutation membership — first delete returns 200, second returns 404. Was: first delete returned 404 with deleted:true body (status/body mismatch).
  • List endpoints honor pagination?limit, ?offset, ?page+per_page, ?starting_after, ?ending_before, ?cursor. has_more is set when the template carries the field. Stripe, PostgREST, page-style, and Google-style conventions covered.
  • List handler is O(limit), not O(N). ?limit=10 against 10k entities: 70ms → 7ms. 1000 parallel ?limit=10: 66s → 0.7s.
  • Idempotency-Key honored on POST (opt-in via [serve.idempotency]). Per-namespace (route, key) → cached response.
  • REST and GraphQL malformed bodies return 400. Empty body, primitives, shape-mismatched arrays all rejected with a structured invalid_request_error envelope. Default fallback when no recorded 4xx variant exists.
  • URL normalization at request entry. /v1/customers/. and /v1/customers/.. are rejected with 400; /v1/customers// collapses to the list route. RFC 3986 dot-segment handling.
  • Seen IDs serve recordings verbatim. When the request path matches a recorded URL exactly, serve the recorded body bit-for-bit. The new hash-based variation only fires for unseen IDs.
  • Path collapser preserves collection roots. /v1/balance, /v1/charges, /v1/payment_intents, etc. stay as specific routes; only ID-shaped segments become :param. No more spurious /v1/:param catch-alls.
  • Numeric path segments collapse to :param after N distinct values (was N=∞). /pokemon/{1,4,25}/pokemon/:param. Was: 3 separate routes; unseen IDs returned 501.
  • Array length distribution preserved. Synthesized responses render arrays at the median observed length, cycling through up to 8 representative elements, instead of folding to a single placeholder.
  • Cardinality-detected per-twin enum_paths. A new synth-time analyzer marks low-cardinality high-repetition kebab/snake-case fields as enum. The PII walker skips them. No more hardcoded list of “pokeapi.ability.name” / etc. entries in source — a new API (Discord, Salesforce, anything) gets the same treatment automatically.
  • Per-request hash-seeded representative selection. Same path → same response (deterministic). Different paths → different response content drawn from observed representatives.
  • Lua handler errors return 500. Previously silently fell through to template rendering with a random muxemwxu-shaped id, making test failures invisible.
  • Lua handlers resolve by filename convention when no explicit hook is set in the model. Was: synthesis never populated vm.lua_hook, so handlers loaded but never ran; template rendering clobbered computed values (total: 134.34 template constant).
  • Form-encoded numeric scalars coerce to recorded type. Stripe amount=8888 now renders as Value::Number(8888) (was "8888").
  • Clock holes resolve per-request. New [serve.clock] mode = real | deterministic | fixed. Default is real wallclock; deterministic uses a seeded monotonic counter.
  • URL rewrite on outbound responses. Absolute URLs at the recorded upstream host are rewritten to point at the twin. Third-party URLs (GitHub raw, CDNs) preserved verbatim (was being replaced with UUID placeholders).
  • Vendor headers stripped on serve by default (Cf-Ray, X-Cache, Server, etc.). Configurable via [serve] strip_headers.
  • Default scrub rules cover email, phone, name, SSN, git author blobs. Git commit metadata in GitHub recordings is tokenized at write time.
  • Doctor scans recordings + model bodies for PII. New --allow-pii flag downgrades findings to info. wraith export openapi github and wraith pack both re-scrub before emit so legacy twins don’t ship raw PII.
  • [pii] scrub.toml section. detect toggle, allowlist for legitimate non-PII paths, default_action, fields.always for explicit overrides. Suffix-matching on *_name / *_email catches customer_name, employee_name, author_email.
  • pseudonymize scrub action — deterministic user_<base62> replacement keyed by HMAC. Stable across recordings/exports/packs for the same input.
  • wraith pack archives are byte-stable with [serve.clock] mode = "deterministic". Two consecutive packs produce identical sha256 hashes.
  • wraith verify-pack reports PII findings alongside the digest check. --strict flips warnings to failures.
  • Confidence-based outbound scrub on live serve. Enum values (bulbasaur, grass, razor-wind) preserved; real person names (including short ones like bob) tokenized. Cardinality detection distinguishes thing-with-a-label-name entities (preserve .name) from person-with-a-personal-name entities (scrub).
  • grpc-status in HTTP/2 trailers for non-empty bodies. Was in initial headers — a spec violation that grpcurl, tonic, gRPC-Go, gRPC-Java, and official Python gRPC all reject. Empty-body errors still use the spec-permitted Trailers-Only form.
  • UTF-8-safe common_prefix. Synthesis no longer panics on multi-byte UTF-8 (Japanese, Cyrillic, accented Latin, emoji). API twins for internationalized APIs (anything with localized strings) build successfully.
  • Lib test count: 2403 → 2890 (+487).
  • 14 brutal-review passes; ended with zero open bugs.
  • 70+ feature/fix commits since v0.5.2.

Streaming and capture fidelity. Three new fixture twins.

  • wraith record survives SIGTERM mid-stream. Long SSE/gRPC streams cut by SIGTERM (or wraith record stop, vessel, systemd) now persist their WREC and session manifest with truncated=true instead of vanishing silently. The forward proxy now also handles SIGTERM; previously only Ctrl-C was caught.
  • In-flight streams pin sessions against the idle timeout. A long SSE stream (e.g. an LLM streaming for >30s on CPU) no longer fragments surrounding exchanges into separate sessions in wraith inspect. Sessions close when the activity actually stops, not when the next exchange happens to start.
  • gRPC replay is byte-faithful for fixed-length arrays. Fixed-position event slots in a recorded stream now render with the correct per-slot template instead of position 0’s. No more ghost proto3 default values on the wire.
  • Synthesized 429 bodies match the route’s recorded 4xx shape. Stripe gets {error: {type, code, message}}, GitHub gets {message, documentation_url}, Twilio and GraphQL likewise. Fallback when no 4xx is recorded is a structured {status, code, message, retry_after} - friendlier to clients deserializing into typed error structs.
  • Volatile response headers freshly emitted at serve time. Date, Server, X-Request-Id, Cf-Ray, Etag are dropped at synth time and synthesized at serve time so 200s and 429s carry the same wallclock Date source - important for HMAC signers and freshness checks.
  • Header presence as a guard. When a single route records both authed (200) and unauthed (401) shapes, wraith synth infers HeaderPresent / HeaderAbsent guards on the discriminating header (e.g. Authorization). At serve and check time, requests route to the matching variant. Header-name-agnostic - any consistently-present-vs-absent header qualifies.

twin.wir.json is the documented portable twin artifact. It used to silently drop several pieces of metadata that wraith serve already supported via the in-memory model. Now round-tripped:

  • Per-route binary content type and body (HTML, plain text, opaque binary endpoints)
  • Per-route gRPC marker
  • Per-variant Lua hook handler
  • Per-route symbol table
  • Per-variant header programs and optional-field lists

All additions are backward-compatible - existing twin.wir.json files load unchanged.

  • Exercise scripts force a session boundary (POST /__wraith/new-session) between recording iterations. Multi-session runs now produce real session boundaries instead of one giant session.
  • wraith inspect surfaces refresh probe recordings (recordings/refresh/<run_id>/sessions/) alongside regular ones.

Three streaming-fixture twins for contributors to replay end to end:

  • mercure - pure SSE hub. Infinite-stream regression target.
  • caddy-sse - minimal controlled SSE fixture with configurable event count, cadence, and payload shape.
  • qdrant - vector DB gRPC twin. Validates the unary gRPC + protobuf-descriptor pipeline.

v0.4 shakedown follow-ups. Twin-quality fixes + lifecycle commands.

  • DELETE replay matches recorded shape. wraith serve now renders the variant body template on DELETE instead of substituting a hardcoded {deleted, id} body. Literal fields like object: "coupon" survive.
  • Numeric epoch fields stay numeric. Fields like Stripe’s created (Unix epoch seconds, integer) are no longer overlaid with ISO 8601 strings. The classified clock unit (epoch_sec / epoch_ms / iso_string) drives output, not the field name.
  • No more $hole_* placeholder leaks. Unfilled holes can never reach the wire under any classification. The hole classifier learns ID shape from observations: prefix, length, and character class. Stripe-shaped IDs (cus_<14 base62>) and short token fields (e.g. 7-char uppercase alnum) are generated correctly.
  • /__wraith/ready returns 200 once the listener is bound. Previously it returned 503 forever, breaking wraith up’s ready poll and wraith status’s ready probe.
  • wraith coverage reports real session counts. Previously every route showed sessions=0.
  • Trace ring buffer captures non-200 responses. --trace now records 429s, fault-injected 5xx, throttle, drop, and timeout responses - exactly the responses you want with --chaos-seed --trace.
  • wraith down: stops twins started by wraith up. SIGTERM with SIGKILL escalation. Idempotent.
  • wraith status: per-twin alive + ready report. Polls /__wraith/ready for each running twin.
  • wraith env: emits WRAITH_<NAME>_PORT and WRAITH_<NAME>_BASE_URL for each twin in the project manifest. Pasteable into a shell or consumed via --format json.

Manifest plumbs simulation flags through wraith up

Section titled “Manifest plumbs simulation flags through wraith up”

Project manifests can now drive the v0.4 simulation layers per twin:

[twins.stripe]
path = "twins/stripe"
port = 8181
chaos_seed = 42
latency_mode = "auto"
trace = true
trace_capacity = 500
rate_limit = true
rate_limit_override = ["GET /v1/foo=5/1sec"]
debug = false
listen = "0.0.0.0:8181"
fidelity = "synth"

All fields optional; existing manifests parse unchanged.

SSE and gRPC server-streaming. Record, synthesize, serve, and conformance-check streaming APIs end to end. See the Streaming guide.

  • SSE (text/event-stream): wraith record captures live without buffering - long-lived streams no longer deadlock the recorder. wraith serve emits realistic streams with per-event timing and rotating per-event content (an LLM twin emits the recorded token sequence, not one repeated character).
  • gRPC server-streaming: wraith record forwards frames live with HTTP/2 trailers preserved. wraith serve emits frame-correct length-prefixed protobuf with grpc-status trailers - gRPC clients connect and stream without Internal: missing trailers.
  • Long-lived bidi streams (cancelled by client deadline, no trailers received) classify as truncated; replay matches.

wraith check now scores streaming exchanges under dedicated PASS criteria:

  • Event count must match the recording.
  • Per-event structural shape (keys, types, constants) must match.
  • Hole-marked fields (variable LLM token text, etcd event keys) tolerate value variance.
  • Termination shape and gRPC trailers must match.

Previously, streaming exchanges rolled up into the unary scorer where streaming-specific divergences could be diluted into a passing score. New behavior: a streaming Error-severity divergence fails the session.

Suppression rules in wraith.toml are applied before scoring, so a suppressed divergence no longer counts against the conformance score. Previously [[diff.suppress]] filtered the report only.

wraith synth infers body-field guards on routes whose variants are discriminated by request-body string fields. Glob paths like messages[*].content are supported. At serve time, when multiple variants’ guards match a request, wraith serve picks the most-specific variant - so a request that matches both a loose 200 catch-all and a tight 404 error variant routes to the 404.

A single route can mix streaming and non-streaming variants. The 200 SSE variant serves a stream; the sibling 404 invalid-model JSON variant serves a normal response.

  • ollama - twins the OpenAI-compat /v1/chat/completions endpoint with stream: true for any local Ollama model.
  • etcd-streaming - extends the etcd twin with KV.Watch, the canonical server-streaming RPC.

Both ship with podman fixtures so contributors can replay end-to-end.

Faulty-service simulation + OpenAPI seed + trace endpoints. Six orphan subsystems wired into the CLI.

See the Simulation guide for the fault/latency/rate-limit story end to end.

  • Fault injection (--fault-profile <path>, --chaos-seed <u64>): six fault types (Error / Delay / Timeout / Drop / Throttle / Partial), deterministic seeded RNG, route globs, header matching, percentage rolls, per-rule trigger caps. generate_chaos_profile builds a realistic mix from the loaded WIR when given just a seed.
  • Latency simulation (--latency-mode <fixed|uniform|recorded|normal|percentile> + aux flags): per-route overrides, seeded ChaCha RNG for deterministic replay. When a fault Delay rule fires, it replaces the latency simulator’s contribution for that request (no compounding).
  • Rate-limit simulation (--rate-limit, --rate-limit-override "METHOD /path=N/Wsec"): FixedWindow and SlidingWindow algorithms, standard X-RateLimit-* + Retry-After headers, shared 429-response builder for fault Throttle and the rate-limit gate.
  • Evaluation order: rate-limit -> fault -> latency -> dispatch. All three layers are Option<Arc<...>> - zero overhead when their flags are absent.

Trace endpoints (--trace [--trace-capacity N])

Section titled “Trace endpoints (--trace [--trace-capacity N])”
  • GET /__wraith/trace/log returns the ring buffer in reverse-chronological order.
  • GET /__wraith/trace/<id> fetches a single trace by id.
  • POST /__wraith/trace/reset clears the buffer.
  • Bounded ring buffer with FIFO eviction. Same control-plane auth policy as the existing /__wraith/* surface. Disabled by default.
  • Each divergence gets a stable drift_id (fingerprint) and a DriftType classification (schema-change / field-removed / status-shift / etc.).
  • JSON envelope adds a drifts[] summary grouping divergences by drift_id, and per-divergence drift_id + drift_type. Additive only - existing consumers see the old shape when skip_serializing_if suppresses empty fields.
  • twins/<name>/drift.toml (sibling of scrub.toml) supports [[suppress]] and [[reclassify]] rules matched by glob on drift_id / route / path / drift_type. Absent file is a silent no-op.
  • Refresh integration deferred until refresh’s probe-execution path lands.
  • New wraith explore --from-openapi <spec.yaml> [--against <url>]: parses OpenAPI 3.x (YAML or JSON), generates scenario plans, optionally executes them against a live URL and reports per-step match/mismatch/error counts. Auth via repeated --header flags.
  • wraith coverage --openapi <spec> extends coverage to report spec-vs-recordings gaps (covered_count, total_count, uncovered_operations).
  • Additive JSON envelope fields - no breaking changes to existing coverage consumers.
  • Router backtracking: literal subtrees with wrong-method no longer block backtracking to param subtrees.
  • Scrub null handling: null JSON values no longer get tokenized.
  • Header allowlist: user with_extra_compare_headers opt-ins no longer overridden by blanket x-* filter.
  • Sync conformance replay: query params now carried through.
  • VCR base64 handling: case-insensitive base64 detection.
  • Async CRUD handlers: error-variant short-circuit restored across Update / Delete.
  • Async handle_list: array-key detection + totalItems / totalPages parity with sync path.
  • Async/sync drift eliminated: async CRUD handlers now delegate to sync dispatch (-561 LOC of duplicate logic).
  • Clock holes carry unit info: ClockUnit::{EpochSec, EpochMs, IsoString} with serde-compatible migration.
  • 1991 lib tests passing (+43 vs v0.3.0). 40+ new integration tests across e2e_fault, e2e_latency, e2e_rate_limit, e2e_serve trace suite, explore_openapi.
  • cli/up.rs, cli/refresh.rs, synth-side rate-limit / latency auto-population remain TODO for v0.4.x or v0.5.

18 twins (REST + GraphQL + gRPC). All PASS. Honest conformance with granular suppression.

  • Protobuf codec: decode (wire->JSON) and encode (JSON->wire) via prost-reflect. 14 tests.
  • gRPC framing: detect, parse, encode length-prefixed frames, extract trailers. 21 tests.
  • HTTP/2 proxy: h2c listener (auto-detects h1/h2), hyper-based upstream client with trailer forwarding, GrpcProxyBody for proper trailer delivery.
  • Synth detection: is_grpc_endpoint(), method-name state op inference (Create/Get/List/Update/Delete), grpc flag on RouteModel. 22 tests.
  • Serve handler: GrpcConfig loads proto descriptors, decodes protobuf requests, encodes protobuf responses. Trailers-only format for unary RPCs.
  • Codec wired into pipeline: synth decodes protobuf bodies to JSON before anti-unification; check decodes recorded protobuf before diffing. Real templates, not echo fallback.
  • X-Wraith-Format: json: debug header bypasses protobuf encoding, returns raw JSON from synth handler.
  • X-Wraith-* headers stripped before forwarding to upstream during recording.
  • Go test service: 6 RPCs (CRUD + streaming), all proto types (nested, enum, oneof, map, repeated, timestamps). Dockerfile for podman.
  • Validated on etcd: real-world gRPC KV service, 3 routes, 0 divergences.
  • Granular list-body suppression: suppress only array contents, not entire envelope. Scalar envelope fields (count, summary, pagination) compared normally.
  • Numeric value comparison: 50 and 50.0 treated as equal (f64 comparison).
  • Empty-string ID mapping fix: prevented path corruption during conformance replay. Fixed Stripe (95->0) and PocketBase (168->0, FAIL->PASS).
  • User field classifications override all auto-detection, including list-body suppression.
  • check --in-memory loads Lua with state: handle_request_sync now calls invoke_handler_with_state. Lua handlers get full state.* and clock.* access.
  • OrderLedger stress test: 5 patterns (computed totals, conditional shapes, list aggregates, state machine, cross-entity joins). 7 handlers. 2 divergences with Lua vs 185 without.
  • POST /__wraith/new-session: force recording session boundary without restarting proxy.
  • Cross-session re-recording: Cloudflare, GitHub, Odoo, Stripe, Linear re-recorded with 2+ sessions each.
  • GitHub GraphQL v4: 16 operations (fragments, anonymous queries, inline fragments, deep nesting, mutations).
  • Updated docs: twin-lifecycle.md rewritten, configuration.md expanded, quickstart updated.

15 APIs at zero divergences. 53/53 sessions passing.

REST (13): Cloudflare, Forgejo, Gitea, GitHub, GitLab, Keycloak, Mattermost, Notion, Odoo, PocketBase, Stripe, Supabase, Twilio. GraphQL (2): Linear (19 ops), Saleor (16 ops, anonymous queries).

  • GraphQL operation routing: Detects GraphQL endpoints, splits single POST /graphql route into per-operation variants with guards. Handles both named operations (operationName field) and anonymous queries (parsed root field). New QueryRootField guard predicate.
  • Header allowlist: Replaced 40+ entry blocklist with 3-entry allowlist (content-type, www-authenticate, proxy-authenticate). Opt-in via with_extra_compare_headers().
  • Divergence suppression: [[diff.suppress]] in wraith.toml for user-declared suppression rules with glob patterns. --show-suppressed flag lists distinct suppressed paths with reasons.
  • Transparent heuristics: Hex color normalization, search/list-like body classification, scalar clobber guard - all reported as suppressed, not hidden.
  • Session tagging: wraith record --tag + wraith synth --tag for selective synthesis.
  • Recording control plane: /__wraith/health, /__wraith/ready, /__wraith/info endpoints during recording.
  • Agentic route fixer: 5 modules, 12 tools, text-based TOOL_CALL protocol. Verified end-to-end.
  • Lua handler sandbox: Full state API (get/put/delete/list/query/count/counter + clock), hot reload, doctor validation.
  • Synth default changed to synth fidelity (was strict).
  • Scalar clobber guard: don’t overlay entity scalar onto template compound type
  • Search/list-like classification: POST search + bare array -> Generated body
  • Hex color heuristic: #e11d48 vs e11d48 suppressed
  • Variant routing guards (PathSegmentEquals, PathSegmentPrefix, FieldEquals, QueryRootField)
  • Dynamic-key object map suppression
  • Order-independent array matching
  • Heuristic timestamp/counter suppression
  • Empty-body response handling
  • Non-JSON content echo (binary/HTML/text strict replay)
  • Gzip decompression in conformance normalizer
  • 30+ additional deterministic fixes across 5 days