korrel / open-source Python SDK

Write your agent test once.
CI today, RL fine-tune tomorrow.

Define a multi-turn agent scenario one time: a user-simulator persona, programmable mock tools, and a scoring rubric. Gate it in pytest now. Export the same definition as a verifiers or OpenEnv reinforcement-learning environment when you train.

$ pip install korrel

Scenario

├─ korrel run pytest CI gate

├─ korrel export --to verifiers RL environment

└─ korrel export --to openenv RL environment

An evaluation and a reinforcement-learning environment are the same object: a dataset, a harness, and a rubric. The CI buyer and the RL buyer want the same definition with a different runtime. Korrel is the authoring layer that carries it across both.

A single Korrel scenario reproduces tau2-bench's deterministic reward identically across the pytest gate, the verifiers environment, and the OpenEnv server. 80 frozen transcripts, exact float equality, zero drift.

Read the benchmark, rerun it yourself →

Bring your own keys. Korrel calls your model provider with your key for both the agent under test and the user-simulator, so each run spends your own provider credits at your provider's rate. The cost lands on your account, never through Korrel, and no key is stored. Point it at a hosted API or a local model.
MIT licensed. Self-host the whole thing.
Built on the spec, not the runner. Korrel targets the open verifiers and OpenEnv environments, so your tests are not tied to one trainer.

Write your agent test once.CI today, RL fine-tune tomorrow.

Write your agent test once.
CI today, RL fine-tune tomorrow.