Skip to main content

Runtime API

POST/v1/evaluate/stream

Evaluate streaming LLM output as it arrives

Evaluate text chunks via Server-Sent Events. Accepts a list of text chunks and streams back SSE events as each window is evaluated. Events: - ``chunk``: forwarded text chunk - ``guardrail``: window evaluation result (non-blocking) - ``block``: window triggered a BLOCK decision - ``done``: final summary with aggregate metrics

Runtime tokenscope: evaluate + bundle fetchSubject to per-plan eval quotaoperation_id: runtime.evaluateStream

Authentication

Create via POST /v1/orgs/{org_id}/tokens/runtime. Scoped to one project + environment.

SDK install

pip install znyx-sdknpm install @znyx/sdk

Header parameters

NameTypeRequiredDescription
X-API-Key#headerstring | nulloptional
authorization#headerstring | nulloptional

Request bodyrequired

FieldTypeRequiredDescription
request_idstringoptional
tenant_idstringoptional
app_idstringoptional
contextstringoptionalinput or output
chunksstring[]requiredText chunks to evaluate in order
policyobject | nulloptionalInline policy (optional)
window_sizeintegeroptional
overlapintegeroptional

Responses

StatusDescription
200Successful Response
422Validation Error

Response schema

any

Errors & what triggers them

CodeTriggerFix
401Missing or invalid X-API-Key / Authorization header.Check the token is still active — rotated tokens return 401 after the grace period ends.
403Token does not have the `evaluate` scope.Use a runtime token (POST /v1/orgs/{org_id}/tokens/runtime).
422Request body failed Pydantic validation (missing tenant_id, bad context, etc.).
429Monthly evaluation quota hit for your plan.Upgrade via POST /v1/billing/checkout, or wait for the next monthly reset.
500Detector crashed or resolver timed out. Typically transient.Retry with backoff. If it persists, check Traces for the request_id.

Notes & examples

When to use this

Use the streaming endpoint when you want to block a response while it is still being generated — not after the whole response is already in the user's hands. Typical cases:

  • Token-streaming chat UIs (OpenAI / Anthropic style).
  • Long-form generation where waiting for evaluate/output after the full response would mean the user has already seen unsafe text.
  • Multi-agent pipelines where a tool-call argument needs to be screened before the next tool fires.

Sliding window model

The request is a list of chunks (one per stream event). Internally, the engine concatenates them into a rolling buffer and evaluates every time the buffer reaches window_size characters, with overlap characters carried forward so phrases spanning two chunks still match.

Defaults (window_size=200, overlap=40) are tuned for Latin-script chat use. Tune up if you get false-positive detector fires at chunk boundaries.

Server-Sent Events

The response is text/event-stream with four event types:

event: chunk
data: {"text": "Hello, let me help..."}

event: guardrail
data: {"window_index": 0, "decision": "ALLOW", "risk_score": 12}

event: block
data: {"detector": "pii", "window_index": 3, "risk_score": 92, "message": "Email redacted"}

event: done
data: {"total_windows": 5, "blocked": true, "aggregate_decision": "BLOCK"}

Your client should process block as terminal — stop forwarding chunks to the user the moment you see one.

Common pitfalls

  • chunks takes strings, not raw token IDs. Decode upstream from whatever your LLM client yields.
  • If you don't set a policy inline, the engine resolves by tenant_id / app_id — same as /evaluate/output. Cache the bundle locally and pass policy directly in latency-sensitive deployments.
  • This endpoint lives on the runtime, not the control plane. Point your client at your runtime's hostname, not api.znyx.ai.
  • POST /v1/evaluate/output — non-streaming equivalent, simpler to wire up.
  • POST /v1/evaluate/input — screen user input before the LLM sees it.

Request

curl -X POST 'https://api.znyx.ai/v1/evaluate/stream' \
  -H 'Authorization: Bearer $ZNYX_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
  "request_id": "stream-0",
  "tenant_id": "default",
  "app_id": "default",
  "context": "output",
  "chunks": [
    "string"
  ],
  "policy": null,
  "window_size": 200,
  "overlap": 40
}'

Response

application/json

Successful Response

null

Schema: any