Play with the models from "Building Better Activation Oracles"

  1. Ask a question. The model generates a chain of thought; we record its internal activations while it thinks.
  2. Select tokens. Click or drag across the chain of thought to pick which positions the oracle gets to peek at.
  3. Ask the oracle. Type a question about the model's reasoning (e.g. "is it confident?", "what answer is it heading toward?"). Both Adam Original and Ours answer using only the selected activations — no access to the CoT text.
Generate a CoT to populate the selectable activation text.

2. Chain of Thought — select tokens

Click or drag across highlighted tokens.
Answer (click to expand)

Adam Original

Run the oracle to see output.

Ours

Run the oracle to see output.