Skip to main content

Runtime API

POST/v1/evaluate/output

Evaluate an LLM response before returning it to the user

Evaluate model output before it is returned to the user. The request body matches ``/v1/evaluate/input`` but the output context applies output-side detectors (e.g. hallucination, exfiltration, quality scoring). Use this in the same pipeline after the LLM call.

Runtime tokenscope: evaluate + bundle fetchSubject to per-plan eval quotaoperation_id: runtime.evaluateOutput

Authentication

Create via POST /v1/orgs/{org_id}/tokens/runtime. Scoped to one project + environment.

SDK install

pip install znyx-sdknpm install @znyx/sdk

Request bodyrequired

FieldTypeRequiredDescription
request_idstringrequiredUnique identifier for this request
tenant_idstringrequiredTenant identifier
app_idstringrequiredApplication identifier
agent_idstringoptionalAgent identifier
envstringoptionalEnvironment (prod, staging, dev)
textstringrequiredText to evaluate
metadataobject | nulloptionalOptional metadata
trace_idstring | nulloptionalDistributed trace ID for correlation
session_idstring | nulloptionalSession/conversation ID for grouping
span_idstring | nulloptionalSpan ID within a trace

Responses

StatusDescription
200Successful Response
422Validation Error

Response schema

request_idrequiredstring
decisionrequiredDecision
risk_scorerequiredinteger

Risk score from 0-100

policy_versionrequiredstring
rule_hits
sanitized_textstring | null

Sanitized text if REDACT/TRANSFORM

sanitized_tool_argsobject | null

Sanitized tool args (for tool evaluation)

user_messagestring | null

Safe message to show end-user when blocked

developer_messagestring | null

Developer-facing explanation

latency_msinteger | null

Total evaluation latency in milliseconds

trace_idstring | null

Trace ID for distributed tracing correlation

session_idstring | null

Session/conversation ID echoed from request

span_idstring | null

Span ID within a trace echoed from request

detector_results

Per-detector timing breakdown

qualityQualityReport | null

Response quality scores (output context only)

field_errors

Field-level errors from structured output validation

remediationRemediationResult | null

Remediation action applied after detector decision

pending_review_idstring | null

Human review queue ID if ask_human remediation was triggered

Errors & what triggers them

CodeTriggerFix
401Missing or invalid X-API-Key / Authorization header.Check the token is still active — rotated tokens return 401 after the grace period ends.
403Token does not have the `evaluate` scope.Use a runtime token (POST /v1/orgs/{org_id}/tokens/runtime).
422Request body failed Pydantic validation (missing tenant_id, bad context, etc.).
429Monthly evaluation quota hit for your plan.Upgrade via POST /v1/billing/checkout, or wait for the next monthly reset.
500Detector crashed or resolver timed out. Typically transient.Retry with backoff. If it persists, check Traces for the request_id.

Notes & examples

When to use this

Call /v1/evaluate/output after the LLM responds but before you send the response to the user. Output-context detectors check things input-context detectors cannot:

  • Hallucination — does the response cite sources that don't exist?
  • Exfiltration — is the model leaking parts of the system prompt?
  • Output PII — did the model regurgitate training-data PII?
  • Quality scoring — 7-dimension scoring (helpfulness, groundedness, etc.) for Growth+ plans.

Minimum viable pipeline

user_input → evaluate/input → (if ALLOW) → LLM → evaluate/output → (if ALLOW) → return to user

Add tool-call guardrails by inserting evaluate/tool between the LLM and the tool dispatcher.

Common pitfalls

  • Output detectors cost more than input detectors — hallucination and quality scoring both invoke a judge model. If p99 latency matters, disable quality scoring on the hot path and run it async via traces.
  • If you're using TRANSFORM on output (e.g. PII redaction on a customer-support bot), return the transformed text to the user, not the original.
  • POST /v1/evaluate/input
  • POST /v1/evaluate/stream — for streaming LLMs (evaluate tokens as they arrive).

Request

curl -X POST 'https://api.znyx.ai/v1/evaluate/output' \
  -H 'Authorization: Bearer $ZNYX_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
  "request_id": "string",
  "tenant_id": "string",
  "app_id": "string",
  "agent_id": "default",
  "env": "prod",
  "text": "string",
  "metadata": null,
  "trace_id": null,
  "session_id": null,
  "span_id": null
}'

Response

application/json

Successful Response

{
  "request_id": "string",
  "decision": "ALLOW",
  "risk_score": 0,
  "policy_version": "string",
  "rule_hits": [
    {
      "rule_id": "string",
      "severity": "low",
      "message": "string"
    }
  ],
  "sanitized_text": null,
  "sanitized_tool_args": null,
  "user_message": null,
  "developer_message": null,
  "latency_ms": null,
  "trace_id": null,
  "session_id": null,
  "span_id": null,
  "detector_results": [
    {
      "detector_name": "string",
      "decision": null,
      "risk_score": 0,
      "latency_ms": 0,
      "rule_hits": [
        {
          "rule_id": "string",
          "severity": "low",
          "message": "string"
        }
      ],
      "transformed": false
    }
  ],
  "quality": null,
  "field_errors": [
    {
      "path": "string",
      "message": "string",
      "expected": null,
      "actual": null
    }
  ],
  "remediation": null,
  "pending_review_id": null
}

Schema: object