December 10

Add to Calendar 2025-12-10 16:00:00 2025-12-10 17:00:00 America/New_York Which Algorithms Have Tight Generalization Bounds? Generalization measures (GMs) are a central tool for providing performance guarantees and informing algorithmic design. Yet, most such bounds are known to be loose (or even vacuous) in practise. In a series of works [1, 2], we focus on GMs in the overparametrized supervised learning setting, where we show that this looseness is unavoidable due to fundamental lower bounds. Notably, these lower bounds hold on average over finite collections of distributions, and with numerically appreciable values.For GMs that are computed solely from the training sample but depend on neither the algorithm nor distribution, we show non-tightness across a large fraction of (algorithm, distribution) combinations.For GMs that can also depend on the algorithm, we show that there can be a trade-off between the algorithm’s ability to learn and the ability to verify the algorithm’s learning success with a GM.Next, we study algorithm-dependent GMs for algorithms that admit a natural notion of algorithmic implicit bias. There, non-tightness of GMs provably occurs whenever the underlying distribution class is rich enough, which is the case for example when learning VC-classes.Lastly, we show that a certain notion of algorithmic stability is sufficient for the existence of tight GMs.Joint work with Ido Nachum (University of Haifa), Jonathan Shafer (MIT), and Michael Gastpar (EPFL).[1]: ICLR 2024, see https://arxiv.org/abs/2309.13658[2]: Neurips 2025 (Spotlight), see https://arxiv.org/abs/2410.01969 TBD

December 03

November 19

Add to Calendar 2025-11-19 16:00:00 2025-11-19 17:00:00 America/New_York Fast Algorithms for Graph Arboricity and Related Problems We give an algorithm for finding the arboricity of a weighted, undirected graph, defined as the minimum number of spanning forests that cover all edges of the graph, in \sqrt{n} m^{1+o(1)} time. This improves on the previous best bound of O(nm) for weighted graphs and O(m^{3/2}) for unweighted graphs (Gabow 1995) for this problem. The running time of our algorithm is dominated by a logarithmic number of calls to a directed global minimum cut subroutine—if the running time of the latter problem improves to m^{1+o(1)} (thereby matching the running time of maximum flow), the running time of our arboricity algorithm would improve further to m^{1+o(1)}.As an application, we also give a new algorithm for computing the entire cut hierarchy—laminar multiway cuts with minimum cut ratio in recursively defined induced subgraphs—in m n^{1+o(1)} time. For the cut hierarchy problem, the previous best bound was O(n^2 m) for weighted graphs and O(n m^{3/2}) for unweighted graphs.This is joint work with Ruoxu Cen, Henry Fleischmann, Jason Li, and Debmalya Panigrahi. TBD

November 12

November 05

Add to Calendar 2025-11-05 16:00:00 2025-11-05 17:00:00 America/New_York Zeroth-order log-concave sampling: Uniform sampling from convex bodies Since the development of the first randomized polynomial-time algorithm for volume computation by Dyer, Frieze, and Kannan in 1989, convex-body sampling has been a central problem at the intersection of algorithms, geometry, and probability. A major milestone came in 1997, when Kannan, Lovász, and Simonovits analyzed the Ball Walk and formulated the influential KLS conjecture. This was extended to log-concave distributions by Lovász and Vempala in 2006, and further accelerated by Cousins and Vempala in 2015 through warm-start generation techniques.In this talk, I will present new and principled approaches that understand, streamline, and improve these advances. First, I propose a simple variant of the proximal sampler that achieves the query complexity with matched Rényi orders between the initial warmness and output guarantee. Then, I introduce a simple annealing scheme that produces a warm start in q-Rényi divergence. To relay a Rényi warmness across the annealing scheme, I establish hypercontractivity under simultaneous heat flow and translate it into an improved mixing guarantee for the proximal sampler under a logarithmic Sobolev inequality. These results extend naturally to general log-concave distributions accessible via evaluation oracles, incurring additional quadratic queries. TBD

October 29

Add to Calendar 2025-10-29 16:00:00 2025-10-29 17:00:00 America/New_York The Mysterious Query Complexity of Tarski Fixed Points Tarski's fixed point theorem has extensive applications across many fields, including verification, semantics, game theory, and economics. Recently, the complexity of finding a Tarski fixed point has attracted significant attention.In this talk, I will introduce the problem of computing a Tarski fixed point over a grid $[N]^d$, highlight our recent progress toward a better understanding of it, and discuss the surprising journey and the mysteries surrounding its complexity.Based on joint work with Xi Chen and Mihalis Yannakakis. TBD

October 22

October 20

Add to Calendar 2025-10-20 16:00:00 2025-10-20 17:00:00 America/New_York Memory as a lens to understand learning and optimization What is the role of memory in learning and optimization? The optimal convergence rates (measures in terms of the number of oracle queries or samples needed) for various optimization problems are achieved by computationally expensive optimization techniques, such as second-order methods and cutting-plane methods. We will discuss if simpler, faster and memory-limited algorithms such as gradient descent can achieve these optimal convergence rates for the prototypical optimization problem of minimizing a convex function with access to a gradient. Our results hint at a perhaps curious dichotomy---it is not possible to significantly improve on the convergence rate of known memory efficient techniques (which are linear-memory variants of gradient descent for many of these problems) without using substantially more memory (quadratic memory for many of these problems). Therefore memory could be a useful discerning factor to provide a clear separation between 'efficient' and 'expensive' techniques. This perspective can also be applied to understand mechanisms which transformers use to solve certain algorithmic tasks. We will show empirical evidence that transformers learn to achieve second-order convergence rates for solving linear-regression tasks, leveraging some of the theory of optimization and memory-limited learning.This is mostly based on joint work with Annie Marsden, Aaron Sidford, Gregory Valiant, Deqing Fu, Tian-Qi Chen and Robin Jia. TBD

October 15

Add to Calendar 2025-10-15 16:00:00 2025-10-15 17:00:00 America/New_York Quality Control on Random Graphs in Sublinear Time Many algorithms are designed to perform well on random instances. However, when running such an algorithm on a specific input, can we trust its output? We define a new class of problems whose formulation allows algorithms to benefit from the randomness of instances while remaining reliable to use on any given input. We call these “quality control” problems, specified by a distribution D and a real-valued quality function Q. We say that an input x is “high quality” if Q(x) is approximately 1, and assume that samples from D are high quality with high probability. The task is to accept x if it is drawn from D and reject x if Q(x) is far from 1.We study this problem in the sublinear setting, where inputs are graphs and the algorithm has access to adjacency matrix queries. Focusing on the case where D is a (sparse) Erdős–Rényi random graph model and Q is proportional to the number of k-cliques in a graph, we show that quality control can be solved superpolynomially faster than the related task of approximating Q in worst-case graphs in the sublinear access model. Joint work with Ronitt Rubinfeld (MIT) and Madhu Sudan (Harvard). TBD

October 08

Add to Calendar 2025-10-08 16:00:00 2025-10-08 17:00:00 America/New_York Approximating High-Dimensional Earth Mover’s Distance as Fast as Closest Pair We give a reduction from $(1+\epsilon)$-approximate Earth Mover's Distance (EMD) to $(1+\epsilon)$-approximate Closest Pair. As a consequence, we improve the fastest known approximation algorithm for high-dimensional EMD. Here, given $p\in [1, 2]$ and two sets of $n$ points $X,Y \subset (\R^d,\ell_p)$, their EMD is the minimum cost of a perfect matching between $X$ and $Y$, where the cost of matching two vectors is their $\ell_p$ distance. Further, Closest Pair is the basic problem of finding a pair of points realizing $\min_{x \in X, y\in Y} ||x-y||_p$. TBD

October 01

Add to Calendar 2025-10-01 16:00:00 2025-10-01 17:00:00 America/New_York Faster Mixing of the Jerrum-Sinclair Chain We show that the Jerrum-Sinclair Markov chain on matchings mixes in time $\widetilde{O}(\Delta^2 m)$ on any graph with $n$ vertices, $m$ edges, and maximum degree $\Delta$, for any constant edge weight $\lambda>0$.For general graphs with arbitrary, potentially unbounded $\Delta$, this provides the first improvement over the classic $\widetilde{O}(n^2 m)$ mixing time bound of Jerrum and Sinclair~(1989) and Sinclair~(1992).To achieve this, we develop a general framework for analyzing mixing times, combining ideas from the classic canonical path method with the "local-to-global" approaches recently developed in high-dimensional expanders, introducing key innovations  to both techniques.Based on joint work with Weiming Feng (The University of Hong Kong); Zhe Ju, Tianshun Miao, Yitong Yin, and Xinyuan Zhang (Nanjing University).​​ TBD

September 24

Add to Calendar 2025-09-24 16:00:00 2025-09-24 17:00:00 America/New_York Learning and Incentives in Human–AI Collaboration As AI systems become more capable, a central challenge is designing them to work effectively with humans. Game theory and online learning provide a natural toolkit for analyzing such interactions, where both humans and algorithms adapt strategically over time.Consider a doctor who wants to diagnose a patient and can consult an AI. Suppose, to start, that the AI is aligned with the doctor’s goal of accurate diagnosis. Even under this cooperative assumption, the doctor and AI may each hold only partial and incomparable knowledge, and they may not have a shared prior. How can we guarantee that their combined predictions are strictly better than relying on either alone, without assuming shared priors or stochastic data? Importantly, they may face many such cases together over time, and this repeated interaction enables richer forms of collaboration. I will present learning-theoretic results showing that collaboration is possible in a fully distribution-free repeated prediction setting, with protocols that ensure regret bounds against benchmarks defined on the joint information of both the human and the AI.In the second part, I will return to the alignment assumption itself. What if the AI is not necessarily aligned with the doctor’s goals? For example, what if the AI model is developed by a pharmaceutical company that has a preference for the doctor prescribing their own drugs? To address such misaligned incentives, it is natural to consider a setting where the doctor has access to multiple AI models, each offered by a different provider. Although each provider may be misaligned, under a mild average alignment assumption—that the doctor’s utility lies in the convex hull of the providers’ utilities—we show that in Nash equilibrium of the competition among providers, the doctor can achieve the same outcomes they would if a perfectly aligned provider were present. The analysis builds on ideas from Bayesian persuasion and information design, adapted to settings with competing AI providers.This talk is based on the following works: Tractable Agreement Protocols (STOC’25), Collaborative Prediction: Tractable Information Aggregation via Agreement (in submission), and Emergent Alignment via Competition (in submission). TBD

September 19

Add to Calendar 2025-09-19 14:00:00 2025-09-19 15:00:00 America/New_York Distribution Learning with Advice by Arnab Bhattacharyya Abstract: We revisit the problem of distribution learning within the framework of learning-augmented algorithms. In this setting, we explore the scenario where a probability distribution is provided as potentially inaccurate advice on the true, unknown distribution. Our objective is to develop learning algorithms whose sample complexity decreases as the quality of the advice improves, thereby surpassing standard learning lower bounds when the advice is sufficiently accurate. Specifically, we demonstrate that this outcome is achievable for the problem of learning a multivariate Gaussian distribution N(μ, Σ) in the PAC learning setting. Classically, in the advice-free setting,  Θ~(d²/ε²) samples are sufficient and worst case necessary to learn d-dimensional Gaussians up to TV distance ε with constant probability. When we are additionally given a parameter T as advice, we show that O~(d²⁻ᵝ/ε²) samples suffices whenever  ‖√T⁻¹Σ√T⁻¹−I‖₁< ε d¹⁻ᵝ (where ‖ ‖₁ denotes the entrywise l1-norm) for any , yielding a polynomial improvement over the advice-free setting. We also show similar results for product distributions over the hypercube.Joint work with Davin Choo (Harvard), Philips George John (NUS), and Themis Gouleakis (NTU). TBD

September 17

Add to Calendar 2025-09-17 14:00:00 2025-09-17 15:00:00 America/New_York Metric Embeddings with Outliers We study the design of embeddings into Euclidean space with outliers. Given a metric space (X, d) and an integer k, the goal is to embed all but k points in X (called the “outliers”) into ℓ2 with the smallest possible distortion c. Finding the optimal distortion c for a given outlier set size k, or alternately the smallest k for a given target distortion c are both NP-hard problems. We consider bi-criteria approximations. Our main result is a polynomial time algorithm that approximates the outlier set size to within an O(log^2 k) factor and the distortion to within a constant factor.We also show that the techniques we use in designing outlier embeddings into Euclidean space have the potential to extend to a wider variety of target embedding spaces. In particular, we consider probabilistic outlier embeddings, which involve probabilistic embeddings and only require that non-outlier pairs have low distortion in expectation over the randomness of the embedding. We show that for probabilistic outlier embeddings into hierarchically separated trees (HSTs), there is a polynomial time algorithm that takes in a finite metric space with a (k,c) probabilistic outlier embedding and produces a probabilistic outlier embedding with O(log^2 k) outliers and O(c) distortion. TBD