[NLP Seminar] A few investigations into the future of LLM scaling

Speaker

Stanford University

Host

NLP Meetings Seminar Series

Scaling up language models has been a key driver of the recent, dramatic improvements in their capabilities. Despite the significant empirical successes of scaling up pre-training in the past 5 years, the future of this approach has become uncertain: large base models no longer show the same types of jumps in benchmark performance, and new forms of scaling (’test-time scaling’) have been proposed to take its place. Does the data inefficiency of pretraining pose fundamental challenges to scaling? Will scaling up inference compute suffice for future capability gains? This talk will cover a few initial investigations into these questions, in the hopes of better understanding whether and how LLM scaling will continue.