How a fake-paper generator tricked scientific journals, 20 years later

Left to right: Dan Aguayo, Max Krohn, and Jeremy Stribling in 2005 (Credits: Frank Dabek).

20 years ago in a pre-ChatGPT world, a fake-paper generator created by 3 MIT kids fooled a major conference so badly that they had to completely reconfigure their reviewing practices.

The goal of the prank was to take aim at predatory publishers who spam researchers with weekly “calls for papers” and then charge excessive “application fees” to make a quick buck. The students’ program, called “SCIgen,” produced nonsensical computer science papers with phony graphs, figures, and citations.

SCIgen authored an article that was initially accepted as a non-reviewed paper to the World Multiconference on Systemics, Cybernetics and Informatics (WMSCI), exposing the conference’s low standards. Two decades later, the impressive capabilities of the program to mimic human speech are commonplace in systems like ChatGPT, which allow students to use AI to augment (and even replace) academic work.

The project was both innovative and prescient for its time, taking advantage of machine-learning algorithms’ pattern recognition skills before AI systems could train on massive quantities of data from social media and other websites. It also demonstrated how machine learning can muddy the waters with real intelligence well before AI-generated images became ubiquitous.

SCIgen represents a historical timestamp of how a largely pre-AI world could be fooled by a program’s ability to string together sentences that appear sensible at first glance. What was yesterday’s elaborate prank is now commonplace and multimodal, including convincing articles, images, and videos all produced by AI.

This emergent technology’s imitation skills are now undeniable, but researchers at MIT continue to investigate whether machine-learning systems are more sophisticated at their core. Perhaps projects like SCIgen masquerade as intelligence — or represent an entirely new form of intellect in its infancy.

What would appear to be a recent explosion of AI technology is in fact the culmination of decades of research, dating back as far as the late 1950s at MIT. SCIgen’s nonsensical outputs made it an early case of AI mischief, and in a rapidly advancing field, it’s also a precursor to the generated gags and computerized content scattered across our algorithms today.