Spiria logo.

AI running out of stuff?

November 24, 2022.

Old books.

© iStock.

Large language models using programs such as GPT-3, which can write articles and even code, are a hot AI research topic. However, according to researchers at Epoch, an AI research and forecasting organization, trouble is looming. We might soon be scraping the bottom of the data barrel to train these models.

Language models are trained with quality texts from sources such as Wikipedia, press articles, scientific papers, and books. These models are fed increasing amounts of data to make them more powerful, but this kind of data is finite.

Some researchers, however, think that quantity might not be synonymous with quality when it comes to language models. Percy Liang, a computer science professor at Stanford University, explains that they’ve “seen how smaller models that are trained on higher-quality data can outperform larger models trained on lower-quality data.”

MIT Technology Review, Tamy Xu, “We could run out of data to train AI language programs.”

2022-11-24