AI Runtime + Evals Engineer

Sytrex
Sytrex

Software Engineering, Data Science

Remote

Posted on Jul 3, 2026

Braven is agentic infrastructure for reinsurance. Our agents run complete workflows: ingest submissions and documents, classify risk against appetite, and generate quotes and bordereaux autonomously, with a 100% audit trail on every decision. As a Senior engineer on the Platform team you own the agent runtime and the evaluation system around it. Paid in USD. Equity with real ownership, early stage. Remote, with in person hackathons and offsites.

What you'll do

  • Own the agent runtime: tool calling, context management, streaming, and graceful recovery when a run fails.
  • Build resilience across multiple model providers (e.g. Bedrock, Azure OpenAI): fallback by error type, retries, cancellation, and cost control.
  • Own the evaluation system: offline datasets, automated graders, CI regression gates, and online evaluation over real production traces, tracking task success, cost and latency, with eval sets per client.
  • Keep the agent predictable in production by hardening non deterministic failures: malformed tool schemas, context overflow, provider specific edge cases.
  • Treat evals as a ship gate, not decoration. Every change ships with its eval and a green regression gate.
  • Make context management token accurate (offload, summarization, budget caps) instead of fragile heuristics.

What we're looking for

  • You have run agents in production, not demos: tool calling, context management, streaming, failure recovery.
  • You have evaluated agentic systems: designed eval sets, graders, metrics and CI gates, using tooling like LangSmith.
  • You can debug non determinism: probabilistic, provider specific, intermittent failures do not scare you.
  • Strong general engineering. We are a TypeScript shop (Bun, Fastify), but strong engineers from other stacks pick it up fast.
  • Strong written and spoken English, our working language.
  • Nice to have: LangGraph / LangChain or similar agent frameworks, low level experience with a major model provider, and an eye for inference cost and security.

Interested in this role?

Send an email to:

javier@sytrex.io