OpenInterpretability

Developer Tools

Open-source toolkit to audit what your LLM knows

About

The inaugural mech interp toolkit operates directly within Claude Code, Cursor, and Cline using MCP. Its production probes, such as FabricationGuard and agent-probe-guard, detect hallucinations and agent malfunctions. The suite includes the ProbeBench leaderboard and offers SAE training, scaling from a free 30-minute Colab notebook to research-level quality. It is released under the Apache-2.0 license.

Launched

May 14, 2026Week 10

Builder
BU
Builder
Reviews

Be the first to review

Comments

Sign in to leave a comment

Sign In