Skip to main content

Benchmarks

POST/v1/orgs/{org_id}/benchmarks

Start Benchmark

Start a benchmark run - evaluates all samples in a dataset against the given policy.

CI or Admin tokenscope: writeoperation_id: benchmarks.start

Authentication

Requires a CI- or admin-level token. Runtime tokens are rejected for mutations.

SDK install

pip install znyx-sdknpm install @znyx/sdk

Path parameters

NameTypeRequiredDescription
org_id#pathstringrequired

Header parameters

NameTypeRequiredDescription
X-API-Key#headerstring | nulloptional
authorization#headerstring | nulloptional

Request bodyrequired

FieldTypeRequiredDescription
dataset_idstringrequired
policyobjectrequired
policy_versionstring | nulloptional
bundle_idstring | nulloptional

Responses

StatusDescription
200Successful Response
422Validation Error

Response schema

idrequiredstring
dataset_idrequiredstring
policy_versionstring | null
bundle_idstring | null
statusrequiredstring
total_samplesinteger
completed_samplesinteger
results_summaryobject | null
started_atstring | null
completed_atstring | null
created_atstring | null

Errors & what triggers them

CodeTriggerFix
401Missing or expired Authorization header.
403Token does not have the required role (admin / editor).
404Target resource does not exist in this org.
422Request body failed validation.

Notes & examples

When to use this

Before publishing a new policy to prod, run it against your regression dataset. Benchmarks catch:

  • False positives — samples you expect ALLOW that get BLOCK.
  • False negatives — samples you expect BLOCK that get ALLOW.
  • Latency regressions — avg / p99 latency drift vs. baseline.

Two ways to specify the policy

Either pass a full policy JSON inline (for ad-hoc testing) or reference an existing bundle_id to benchmark a specific published bundle. Don't pass both.

Running in CI

Typical flow:

1. Publish bundle to dev on every PR. 2. Benchmark the new bundle against your regression dataset. 3. Fail the PR check if block_rate_delta > threshold vs. the previous bundle (use GET /benchmarks/compare). 4. Promote to staging on green.

Reading results

After start, the run status transitions pending → running → completed. Poll GET /benchmarks/{id} until status == completed, then pull GET /benchmarks/{id}/results for per-sample rows.

  • GET /v1/orgs/{org_id}/benchmarks/compare?a=...&b=... — diff two runs.
  • GET /v1/orgs/{org_id}/benchmarks/{run_id}/results — per-sample pass/fail.

Request

curl -X POST 'https://api.znyx.ai/v1/orgs/00000000-0000-0000-0000-000000000000/benchmarks' \
  -H 'Authorization: Bearer $ZNYX_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
  "dataset_id": "string",
  "policy": {},
  "policy_version": null,
  "bundle_id": null
}'

Response

application/json

Successful Response

{
  "id": "string",
  "dataset_id": "string",
  "policy_version": null,
  "bundle_id": null,
  "status": "string",
  "total_samples": 0,
  "completed_samples": 0,
  "results_summary": null,
  "started_at": null,
  "completed_at": null,
  "created_at": null
}

Schema: object