korrel / open-source Python SDK
Write your agent test once.
CI today, RL fine-tune tomorrow.
Define a multi-turn agent scenario one time: a user-simulator persona, programmable mock tools, and a scoring rubric. Gate it in pytest now. Export the same definition as a verifiers or OpenEnv reinforcement-learning environment when you train.
An evaluation and a reinforcement-learning environment are the same object: a dataset, a harness, and a rubric. The CI buyer and the RL buyer want the same definition with a different runtime. Korrel is the authoring layer that carries it across both.
A single Korrel scenario reproduces tau2-bench's deterministic reward identically across the pytest gate, the verifiers environment, and the OpenEnv server. 80 frozen transcripts, exact float equality, zero drift.
- Bring your own keys. Korrel calls your model provider with your key for both the agent under test and the user-simulator, so each run spends your own provider credits at your provider's rate. The cost lands on your account, never through Korrel, and no key is stored. Point it at a hosted API or a local model.
- MIT licensed. Self-host the whole thing.
- Built on the spec, not the runner. Korrel targets the open verifiers and OpenEnv environments, so your tests are not tied to one trainer.