Benchmarks
/v1/orgs/{org_id}/benchmarksStart Benchmark
Start a benchmark run - evaluates all samples in a dataset against the given policy.
Authentication
Requires a CI- or admin-level token. Runtime tokens are rejected for mutations.
SDK install
pip install znyx-sdknpm install @znyx/sdkPath parameters
| Name | Type | Required | Description |
|---|---|---|---|
| org_id#path | string | required | — |
Header parameters
| Name | Type | Required | Description |
|---|---|---|---|
| X-API-Key#header | string | null | optional | — |
| authorization#header | string | null | optional | — |
Request bodyrequired
| Field | Type | Required | Description |
|---|---|---|---|
| dataset_id | string | required | — |
| policy | object | required | — |
| policy_version | string | null | optional | — |
| bundle_id | string | null | optional | — |
Responses
| Status | Description |
|---|---|
| 200 | Successful Response |
| 422 | Validation Error |
Response schema
Errors & what triggers them
| Code | Trigger | Fix |
|---|---|---|
| 401 | Missing or expired Authorization header. | — |
| 403 | Token does not have the required role (admin / editor). | — |
| 404 | Target resource does not exist in this org. | — |
| 422 | Request body failed validation. | — |
Notes & examples
When to use this
Before publishing a new policy to prod, run it against your regression dataset. Benchmarks catch:
- False positives — samples you expect
ALLOWthat getBLOCK. - False negatives — samples you expect
BLOCKthat getALLOW. - Latency regressions — avg / p99 latency drift vs. baseline.
Two ways to specify the policy
Either pass a full policy JSON inline (for ad-hoc testing) or reference an existing bundle_id to benchmark a specific published bundle. Don't pass both.
Running in CI
Typical flow:
1. Publish bundle to dev on every PR. 2. Benchmark the new bundle against your regression dataset. 3. Fail the PR check if block_rate_delta > threshold vs. the previous bundle (use GET /benchmarks/compare). 4. Promote to staging on green.
Reading results
After start, the run status transitions pending → running → completed. Poll GET /benchmarks/{id} until status == completed, then pull GET /benchmarks/{id}/results for per-sample rows.
Related
GET /v1/orgs/{org_id}/benchmarks/compare?a=...&b=...— diff two runs.GET /v1/orgs/{org_id}/benchmarks/{run_id}/results— per-sample pass/fail.
Request
curl -X POST 'https://api.znyx.ai/v1/orgs/00000000-0000-0000-0000-000000000000/benchmarks' \
-H 'Authorization: Bearer $ZNYX_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"dataset_id": "string",
"policy": {},
"policy_version": null,
"bundle_id": null
}'Response
Successful Response
{
"id": "string",
"dataset_id": "string",
"policy_version": null,
"bundle_id": null,
"status": "string",
"total_samples": 0,
"completed_samples": 0,
"results_summary": null,
"started_at": null,
"completed_at": null,
"created_at": null
}Schema: object