Benchmarks

POST/v1/orgs/{org_id}/benchmarks

Start Benchmark

Start a benchmark run - evaluates all samples in a dataset against the given policy.

CI or Admin tokenscope: writeoperation_id: benchmarks.start

Authentication

Requires a CI- or admin-level token. Runtime tokens are rejected for mutations.

SDK install

pip install znyx-sdknpm install @znyx/sdk

Path parameters

Name	Type	Required	Description
org_id#path	string	required	—

Header parameters

Name	Type	Required	Description
X-API-Key#header	string \| null	optional	—
authorization#header	string \| null	optional	—

Request bodyrequired

Field	Type	Required	Description
dataset_id	string	required	—
policy	object	required	—
policy_version	string \| null	optional	—
bundle_id	string \| null	optional	—

Responses

Status	Description
200	Successful Response
422	Validation Error

Response schema

idrequiredstring

dataset_idrequiredstring

policy_versionstring | null

bundle_idstring | null

statusrequiredstring

total_samplesinteger

completed_samplesinteger

results_summaryobject | null

started_atstring | null

completed_atstring | null

created_atstring | null

Errors & what triggers them

Code	Trigger	Fix
401	Missing or expired Authorization header.	—
403	Token does not have the required role (admin / editor).	—
404	Target resource does not exist in this org.	—
422	Request body failed validation.	—

Notes & examples

When to use this

Before publishing a new policy to prod, run it against your regression dataset. Benchmarks catch:

False positives — samples you expect ALLOW that get BLOCK.
False negatives — samples you expect BLOCK that get ALLOW.
Latency regressions — avg / p99 latency drift vs. baseline.

Two ways to specify the policy

Either pass a full policy JSON inline (for ad-hoc testing) or reference an existing bundle_id to benchmark a specific published bundle. Don't pass both.

Running in CI

Typical flow:

1. Publish bundle to dev on every PR. 2. Benchmark the new bundle against your regression dataset. 3. Fail the PR check if block_rate_delta > threshold vs. the previous bundle (use GET /benchmarks/compare). 4. Promote to staging on green.

Reading results

After start, the run status transitions pending → running → completed. Poll GET /benchmarks/{id} until status == completed, then pull GET /benchmarks/{id}/results for per-sample rows.

GET /v1/orgs/{org_id}/benchmarks/compare?a=...&b=... — diff two runs.
GET /v1/orgs/{org_id}/benchmarks/{run_id}/results — per-sample pass/fail.

Request

curl -X POST 'https://api.znyx.ai/v1/orgs/00000000-0000-0000-0000-000000000000/benchmarks' \
  -H 'Authorization: Bearer $ZNYX_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
  "dataset_id": "string",
  "policy": {},
  "policy_version": null,
  "bundle_id": null
}'

Response

application/json

Successful Response

{
  "id": "string",
  "dataset_id": "string",
  "policy_version": null,
  "bundle_id": null,
  "status": "string",
  "total_samples": 0,
  "completed_samples": 0,
  "results_summary": null,
  "started_at": null,
  "completed_at": null,
  "created_at": null
}

Schema: object