Embodied Intelligence (EI) Joint Seminar Presentation
Hongyin Luo & Yung-Sung Chuang & Philip Schroeder
MIT CSAIL
Add to Calendar
2025-04-24 16:00:00
2025-04-24 17:00:00
America/New_York
Embodied Intelligence (EI) Joint Seminar Presentation
There will be a joint presentation this week by three MIT CSAIL members from the Spoken Language Systems group. Title: Quantifying Generalization Complexity for Large Language ModelsAbstract: LLMs have shown remarkable performance in a range of complex tasks, but how well do they generalize beyond their training data distribution and how do we quantitatively measure such generalization? This talk presents our recent ICLR work on SCYLLA, an evaluation framework that disentangles generalization from memorization in LLMs. Using a dynamic evaluation approach, SCYLLA quantifies the generalization capabilities of LLMs across complexity levels, revealing key insights into their performance gaps between in-distribution (ID) and out-of-distribution (OOD) data. We will explore findings like the generalization valley — a non-monotonic relationship between task complexity and performance, which suggests a critical threshold where LLMs' reliance on non-generalizable behavior peaks. Additionally, we'll discuss critical complexity, which shifts as model size increases, suggesting that larger models can tackle more complex reasoning tasks before they begin to over-rely on memorization. This talk will also cover our benchmarking results across 28 popular LLMs, including both open-source models (e.g., LLaMA, Qwen) and closed models (e.g., Claude, GPT). The aim is to provide a clearer understanding of their generalization capabilities and help foster more robust methods for evaluating and augmenting LLMs.Bio: Hongyin Luo is a research scientist at MIT CSAIL, working with Dr. James Glass. Hongyin focuses on improving the efficiency and transparency of language model reasoning with structured and symbolic inference frameworks.Title: Reducing Hallucinations in LLMs via Decoding, Detection, and CitationAbstract: Large language models (LLMs) often produce hallucinations—content that is factually incorrect or unsupported by the real-world facts or input context. This talk presents three approaches that address this challenge from complementary perspectives. 1. DoLa is a decoding method that improves truthfulness by contrasting output distributions from earlier and final transformer layers, leveraging observations of the layer-wise localization of factual knowledge.https://arxiv.org/abs/2309.03883 2. Lookback Lens detects contextual hallucinations using only the information from the attention maps, and transfers well across tasks and model sizes.https://arxiv.org/abs/2407.07071 3. SelfCite introduces a self-supervised framework for aligning LLMs to generate fine-grained citations, using context ablation to provide a simple but effective reward for the necessity and sufficiency of a citation, achieving great performance comparable to Claude Citations with only an 8B model.https://arxiv.org/abs/2502.09604 Together, these techniques offer lightweight and scalable solutions for improving the factual reliability and verifiability of LLM outputs. Bio: Yung-Sung Chuang is a fourth-year PhD student at MIT CSAIL, working with Dr. James Glass. His research focuses on improving the reliability and factuality of large language models.Title: THREAD: Thinking Deeper with Recursive SpawningAbstract: Large language models have shown impressive capabilities across diverse settings, but still struggle as the length and complexity of the context increases. To address this challenge, we introduce a new framework: Thinking Recursively and Dynamically (ThReaD). THREAD frames model generation as a thread of execution that, based on the context, can run to completion or dynamically spawn new threads in a recursive fashion. By spawning, threads can offload work (e.g., reasoning, retrieving information, analyzing data) to child threads, which only return tokens needed for the parent thread to do its work. We show significant performance gains with THREAD in the settings of LLM task solving and question answering, where the dynamic threading allows the model to recursively decompose the given task or question into progressively simpler sub-problems that can be solved by separate child threads. In an extension of this work, we also demonstrate how a THREAD-based framework can improve reasoning over videos with vision-language models. Bio: Philip Schroeder is a PhD student at MIT CSAIL, advised by Dr. Jim Glass, in the Spoken Language Systems Group. His work focuses on advancing the reasoning capabilities of LLMs and VLMs through embodied interaction with external environments, both virtual and real.
TBD