AI@MIT Reading Group

Back to Events

Seminar Series

October 08

Output Supervision Can Obfuscate the Chain of Thought

Alex Turner

Google DeepMind

Part Of

AI@MIT Reading Group

6:00P

- 7:00P

Location

56-167

Add to Calendar 2025-10-08 18:00:00 2025-10-08 19:00:00 America/New_York Output Supervision Can Obfuscate the Chain of Thought RSVP here Recently, OpenAI showed that training against a chain of thought (CoT) monitor can cause obfuscated CoTs, which contain bad behavior the monitor cannot detect. They proposed to keep CoTs monitorable by training only against output monitors that do not have access to CoT. We show that such training can still cause obfuscated CoTs via two mechanisms. First, when a model is trained to produce a safe-looking output, that model may generalize to making its CoTs look safe. Second, since later tokens are conditioned on earlier ones, safe-looking CoTs may increase the likelihood of safe outputs, causing safe-looking CoTs to be reinforced. We introduce two mitigations to address these two issues, which achieve a Pareto improvement in terms of monitorability and task performance compared to regular training. To our knowledge, we are the first to identify and mitigate these problems. Our work implies that preserving CoT monitorability is more difficult than previously thought; we suggest practical guidelines for AI developers to maintain monitorable CoTs.Initial research note: https://www.alignmentforum.org/posts/CM7AsQoBxDW4vhkP3/optimizing-the-final-output-can-obfuscate-cot-research-note?_ga=2.185282604.447175782.1759771884-9535815.1758307660  TBD

October 01

AI Reasoning at Scale with Search

Jonathan Li

Visiting Student, Caltech

Part Of

AI@MIT Reading Group

6:00P

- 7:00P

Location

56-167

Add to Calendar 2025-10-01 18:00:00 2025-10-01 19:00:00 America/New_York AI Reasoning at Scale with Search RSVP here: https://forms.gle/iztowTvkndwa9nBM8Large Language Models (LLMs) have shown impressive generalization across a wide range of tasks, yet they still struggle with complex reasoning and out-of-distribution problem solving. Rather than simply memorizing patterns from pretraining, we seek LLMs that can innovate—generating novel solutions in unfamiliar domains. In this talk, I present a common framework for integrating search-based techniques with LLMs to push the boundaries of their reasoning capabilities. By shifting computational effort from training time to inference time, we enable a new paradigm of inference-time scaling, where search becomes a mechanism for exploration, deliberation, and improvement. Unlike classical search over symbolic states or action spaces, LLM-guided search must operate over open-ended text, requiring novel approaches that are language-centric and model-aware. Through applications in strategy games, code generation, and mathematical problem solving, I will illustrate how these search-augmented methods unlock human-level performance in challenging, unfamiliar environments—paving the way toward more general and superhuman AI systems. TBD

September 18

Understanding AI Systems at Scale: Applied Interpretability, Agent Robustness, and the Science of Model Behaviors

Neil Chowdhury & Vincent Huang

Transluce

Part Of

AI@MIT Reading Group

6:00P

- 7:00P

Location

4-370

Add to Calendar 2025-09-18 18:00:00 2025-09-18 19:00:00 America/New_York Understanding AI Systems at Scale: Applied Interpretability, Agent Robustness, and the Science of Model Behaviors RSVP here: https://forms.gle/XpirPb17Q9HgoshP6 Join two Transluce researchers as they discuss their latest work and research vision. Transluce is a company building the public tech stack for understanding AI systems. Topics will include applied interpretability, scalable oversight, reinforcement learning, and discovering rare behaviors in language models. TBD

September 17

ML for drug discovery at Genesis Therapeutics

Christina Ji, Pranav Murugan, David Williams

Genesis Therapeutics

Part Of

AI@MIT Reading Group

6:00P

- 7:00P

Location

6-120

Add to Calendar 2025-09-17 18:00:00 2025-09-17 19:00:00 America/New_York ML for drug discovery at Genesis Therapeutics RSVP here: https://forms.gle/V3YRySh1urjvCMSt5 Genesis Therapeutics is an industry-leading start-up in the ML-driven drug discovery space. At Genesis, we are integrating ML into many phases of the drug discovery process: from generating new molecules, to sampling protein-ligand conformations, to predicting properties such as potency and ADME. Genesis has built a state-of-the-art denoising diffusion model for protein-ligand structure prediction. Genesis has also developed ML models for molecular property prediction to accelerate the virtual screening process and de-risk drug discovery programs. Genesis Therapeutics has raised over $300M in funding from top technology and biotech investors. We are hiring for internship and full-time positions in ML research, software engineering, and computational chemistry at https://genesistherapeutics.ai/careers/ TBD