Runtime API
/v1/evaluate/outputEvaluate an LLM response before returning it to the user
Evaluate model output before it is returned to the user. The request body matches ``/v1/evaluate/input`` but the output context applies output-side detectors (e.g. hallucination, exfiltration, quality scoring). Use this in the same pipeline after the LLM call.
Authentication
Create via POST /v1/orgs/{org_id}/tokens/runtime. Scoped to one project + environment.
SDK install
pip install znyx-sdknpm install @znyx/sdkRequest bodyrequired
| Field | Type | Required | Description |
|---|---|---|---|
| request_id | string | required | Unique identifier for this request |
| tenant_id | string | required | Tenant identifier |
| app_id | string | required | Application identifier |
| agent_id | string | optional | Agent identifier |
| env | string | optional | Environment (prod, staging, dev) |
| text | string | required | Text to evaluate |
| metadata | object | null | optional | Optional metadata |
| trace_id | string | null | optional | Distributed trace ID for correlation |
| session_id | string | null | optional | Session/conversation ID for grouping |
| span_id | string | null | optional | Span ID within a trace |
Responses
| Status | Description |
|---|---|
| 200 | Successful Response |
| 422 | Validation Error |
Response schema
Risk score from 0-100
Sanitized text if REDACT/TRANSFORM
Sanitized tool args (for tool evaluation)
Safe message to show end-user when blocked
Developer-facing explanation
Total evaluation latency in milliseconds
Trace ID for distributed tracing correlation
Session/conversation ID echoed from request
Span ID within a trace echoed from request
Per-detector timing breakdown
Response quality scores (output context only)
Field-level errors from structured output validation
Remediation action applied after detector decision
Human review queue ID if ask_human remediation was triggered
Errors & what triggers them
| Code | Trigger | Fix |
|---|---|---|
| 401 | Missing or invalid X-API-Key / Authorization header. | Check the token is still active — rotated tokens return 401 after the grace period ends. |
| 403 | Token does not have the `evaluate` scope. | Use a runtime token (POST /v1/orgs/{org_id}/tokens/runtime). |
| 422 | Request body failed Pydantic validation (missing tenant_id, bad context, etc.). | — |
| 429 | Monthly evaluation quota hit for your plan. | Upgrade via POST /v1/billing/checkout, or wait for the next monthly reset. |
| 500 | Detector crashed or resolver timed out. Typically transient. | Retry with backoff. If it persists, check Traces for the request_id. |
Notes & examples
When to use this
Call /v1/evaluate/output after the LLM responds but before you send the response to the user. Output-context detectors check things input-context detectors cannot:
- Hallucination — does the response cite sources that don't exist?
- Exfiltration — is the model leaking parts of the system prompt?
- Output PII — did the model regurgitate training-data PII?
- Quality scoring — 7-dimension scoring (helpfulness, groundedness, etc.) for Growth+ plans.
Minimum viable pipeline
user_input → evaluate/input → (if ALLOW) → LLM → evaluate/output → (if ALLOW) → return to userAdd tool-call guardrails by inserting evaluate/tool between the LLM and the tool dispatcher.
Common pitfalls
- Output detectors cost more than input detectors — hallucination and quality scoring both invoke a judge model. If p99 latency matters, disable quality scoring on the hot path and run it async via traces.
- If you're using
TRANSFORMon output (e.g. PII redaction on a customer-support bot), return the transformed text to the user, not the original.
Related
POST /v1/evaluate/inputPOST /v1/evaluate/stream— for streaming LLMs (evaluate tokens as they arrive).
Request
curl -X POST 'https://api.znyx.ai/v1/evaluate/output' \
-H 'Authorization: Bearer $ZNYX_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"request_id": "string",
"tenant_id": "string",
"app_id": "string",
"agent_id": "default",
"env": "prod",
"text": "string",
"metadata": null,
"trace_id": null,
"session_id": null,
"span_id": null
}'Response
Successful Response
{
"request_id": "string",
"decision": "ALLOW",
"risk_score": 0,
"policy_version": "string",
"rule_hits": [
{
"rule_id": "string",
"severity": "low",
"message": "string"
}
],
"sanitized_text": null,
"sanitized_tool_args": null,
"user_message": null,
"developer_message": null,
"latency_ms": null,
"trace_id": null,
"session_id": null,
"span_id": null,
"detector_results": [
{
"detector_name": "string",
"decision": null,
"risk_score": 0,
"latency_ms": 0,
"rule_hits": [
{
"rule_id": "string",
"severity": "low",
"message": "string"
}
],
"transformed": false
}
],
"quality": null,
"field_errors": [
{
"path": "string",
"message": "string",
"expected": null,
"actual": null
}
],
"remediation": null,
"pending_review_id": null
}Schema: object