Get Benchmark Results

Get per-sample results for a benchmark run.

Runtime, CI or Admin tokenscope: readoperation_id: benchmarks.getResults

Authentication

Any bearer token belonging to the org can read this endpoint.

SDK install

pip install znyx-sdknpm install @znyx/sdk

Path parameters

Name	Type	Required	Description
org_id#path	string	required	—
run_id#path	string	required	—

Query parameters

Name	Type	Required	Description
limit#query	integer	optional	—
offset#query	integer	optional	—

Header parameters

Name	Type	Required	Description
X-API-Key#header	string \| null	optional	—
authorization#header	string \| null	optional	—

Responses

Status	Description
200	Successful Response
422	Validation Error

Response schema

totalrequiredinteger

limitrequiredinteger

offsetrequiredinteger

resultsrequired

Errors & what triggers them

Code	Trigger	Fix
401	Missing or expired Authorization header.	—
403	Token does not have org access (wrong org_id, or insufficient role).	—
404	Resource does not exist in this org.	—

Notes & examples

When to use this

After a benchmark run completes (status == completed), call this to get the detail that the summary on GET /benchmarks/{id} glosses over. One row per evaluated sample. Typical use:

Triage regressions — filter for is_correct=false to find where the new policy disagrees with the dataset's expected decision.
Latency drill-down — sort by latency_ms to find the slow detector hits.
Generate retraining data — export false-positive rows directly into an annotated dataset via POST /v1/orgs/{org_id}/annotations/export.

Response row shape

Each result row carries:

input_text / context — the sample's original input, for grepping.
expected_decision / expected_rule_hits — what the dataset says should happen.
actual_decision / actual_risk_score / actual_rule_hits — what the runtime actually did.
detector_results — per-detector breakdown, same shape as the Traces page.
is_correct — boolean: matched the expected decision.

Pagination

limit caps at 1000. For datasets above that size, paginate with offset and concatenate client-side.

GET /benchmarks/compare?a=…&b=… — diff two runs side-by-side without pulling per-sample rows yourself.
GET /v1/orgs/{org_id}/benchmarks/{run_id} — run summary + aggregate pass/fail counts.

Request

curl -X GET 'https://api.znyx.ai/v1/orgs/00000000-0000-0000-0000-000000000000/benchmarks/00000000-0000-0000-0000-000000000000/results' \
  -H 'Authorization: Bearer $ZNYX_TOKEN'

Response

application/json

Successful Response

{
  "total": 0,
  "limit": 0,
  "offset": 0,
  "results": [
    {
      "id": "string",
      "sample_id": "string",
      "input_text": null,
      "context": null,
      "expected_decision": null,
      "expected_rule_hits": null,
      "actual_decision": null,
      "actual_risk_score": 0,
      "actual_rule_hits": null,
      "latency_ms": 0,
      "is_correct": null,
      "detector_results": null
    }
  ]
}

Schema: object