Spiria logo.

AI gives compression algorithms a run for their money

September 28, 2023.


Chinchilla. © iStock.

In an arXiv research paper titled “Language Modeling Is Compression”, researchers used Chinchilla 70b, a large language model (LLM) by Deepmind, to compress images from the ImageNet database without any losses. They compressed images to 43.4% of original size, beating the PNG algorithm that compressed the same data to 58.5%. As for audio, Chinchilla compressed the samples of the LibriSpeech audio data set to just 16.4% of their raw size, substantially outdoing FLAC’s compression of 30.3%. These results suggest that though Chinchilla 70B was mainly trained to deal with text, it is surprisingly effective in compressing other types of data, often outperforming algorithms specifically designed for this task. This opens the door to new, original applications for large language models.

That being said, an LLM requires much more computing power, energy and memory than lossless compression algorithms such as FLAC or PNG, which were designed to be lightweight and very fast. Chinchilla, with its 70 billion parameters, is not about to replace our trusty algorithms in many common applications.

Ars Technica, Benj Edwards, “AI language models can exceed PNG and FLAC in lossless compression, says study.”