Evaluate streaming LLM output as it arrives

Evaluate text chunks via Server-Sent Events. Accepts a list of text chunks and streams back SSE events as each window is evaluated. Events: - ``chunk``: forwarded text chunk - ``guardrail``: window evaluation result (non-blocking) - ``block``: window triggered a BLOCK decision - ``done``: final summary with aggregate metrics

Runtime tokenscope: evaluate + bundle fetchSubject to per-plan eval quotaoperation_id: runtime.evaluateStream

Authentication

Create via POST /v1/orgs/{org_id}/tokens/runtime. Scoped to one project + environment.

SDK install

pip install znyx-sdknpm install @znyx/sdk

Header parameters

Name	Type	Required	Description
X-API-Key#header	string \| null	optional	—
authorization#header	string \| null	optional	—

Request bodyrequired

Field	Type	Required	Description
request_id	string	optional	—
tenant_id	string	optional	—
app_id	string	optional	—
context	string	optional	input or output
chunks	string[]	required	Text chunks to evaluate in order
policy	object \| null	optional	Inline policy (optional)
window_size	integer	optional	—
overlap	integer	optional	—

Responses

Status	Description
200	Successful Response
422	Validation Error

Response schema

any

Errors & what triggers them

Code	Trigger	Fix
401	Missing or invalid X-API-Key / Authorization header.	Check the token is still active — rotated tokens return 401 after the grace period ends.
403	Token does not have the `evaluate` scope.	Use a runtime token (POST /v1/orgs/{org_id}/tokens/runtime).
422	Request body failed Pydantic validation (missing tenant_id, bad context, etc.).	—
429	Monthly evaluation quota hit for your plan.	Upgrade via POST /v1/billing/checkout, or wait for the next monthly reset.
500	Detector crashed or resolver timed out. Typically transient.	Retry with backoff. If it persists, check Traces for the request_id.

Notes & examples

When to use this

Use the streaming endpoint when you want to block a response while it is still being generated — not after the whole response is already in the user's hands. Typical cases:

Token-streaming chat UIs (OpenAI / Anthropic style).
Long-form generation where waiting for evaluate/output after the full response would mean the user has already seen unsafe text.
Multi-agent pipelines where a tool-call argument needs to be screened before the next tool fires.

Sliding window model

The request is a list of chunks (one per stream event). Internally, the engine concatenates them into a rolling buffer and evaluates every time the buffer reaches window_size characters, with overlap characters carried forward so phrases spanning two chunks still match.

Defaults (window_size=200, overlap=40) are tuned for Latin-script chat use. Tune up if you get false-positive detector fires at chunk boundaries.

Server-Sent Events

The response is text/event-stream with four event types:

event: chunk
data: {"text": "Hello, let me help..."}

event: guardrail
data: {"window_index": 0, "decision": "ALLOW", "risk_score": 12}

event: block
data: {"detector": "pii", "window_index": 3, "risk_score": 92, "message": "Email redacted"}

event: done
data: {"total_windows": 5, "blocked": true, "aggregate_decision": "BLOCK"}

Your client should process block as terminal — stop forwarding chunks to the user the moment you see one.

Common pitfalls

chunks takes strings, not raw token IDs. Decode upstream from whatever your LLM client yields.
If you don't set a policy inline, the engine resolves by tenant_id / app_id — same as /evaluate/output. Cache the bundle locally and pass policy directly in latency-sensitive deployments.
This endpoint lives on the runtime, not the control plane. Point your client at your runtime's hostname, not api.znyx.ai.

POST /v1/evaluate/output — non-streaming equivalent, simpler to wire up.
POST /v1/evaluate/input — screen user input before the LLM sees it.

Request

curl -X POST 'https://api.znyx.ai/v1/evaluate/stream' \
  -H 'Authorization: Bearer $ZNYX_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
  "request_id": "stream-0",
  "tenant_id": "default",
  "app_id": "default",
  "context": "output",
  "chunks": [
    "string"
  ],
  "policy": null,
  "window_size": 200,
  "overlap": 40
}'

Response

application/json

Successful Response

null

Schema: any