Spiria logo.

100 petabytes on the asphalt superhighway: the Snowmobile solution

December 6, 2018.

In the era of Big Data and megadata, consider Snowmobile, a most seasonable solution next time you need to transfer a couple of exabytes to the cloud.

Exabytes are still a rare order of magnitude, but with the explosion of digital data, we’re likely to come across them increasingly often. An exabyte is an eye-goggling one million terabytes, also known as one billion gigabytes.

But you don’t need to have exabytes of data to experience problems transferring files to the cloud. Even companies with moderately large volumes of data are discovering that the available network capacity just isn’t up to transferring megadata within reasonable cost- and time parameters. For example, moving the video archive of a satellite imagery company over the Internet can take months, if not years, when a 100 megabit per second Fast Ethernet connection can only transfer 1 terabyte of data (1,000 GB) per 24 hours.

In 2016, Amazon offered a solution that looked like a good, old-fashioned publicity stunt: the Snowmobile, a 45-foot, 4-tonne container piled with servers and hauled on an 18-wheeler. The behemoth would show up at your door to load and haul away your data, in an operation that very concretely illustrated the meaning of the information superhighway.

Snowmobile, AWS.

Snowmobile. © Amazon Web Services.

But you’d need no less than 10 Snowmobile trips to move one exabyte of data because its capacity is a paltry 100 petabytes. Even so, it would still be able to move two copies of Internet Archive’s 50-petabyte Petabox, an operation that would take 28 years and 7 months on a 1,000-megabit per second fibre optic cable (and 5,101 years on a 20th-century, 56 kb/s modem)!

The Snowmobile is a souped-up version of the Snowball, the AWS server in a grey box weighing over 20 kilos and able to store 50 terabytes of data. The Snowmobile has the storage capacity of 2,000 50-TB Snowballs! If, like me, you thought that the Snowmobile was nothing more than an Amazon PR scheme to raise AWS’s profile in worldwide media, you’d be mistaken. Well, it is that, but much more besides: it’s an actual service used by large corporations to meet a real, if fledgling, need.

Snowball, AWS.

Snowball. © Amazon Web Services.

Take for example DigitalGlobe, an American company specializing in satellite imagery. Over its 17 years of operations, it has gathered over 100 petabytes of images of our planet’s surface. Every year, its constellation of commercial satellites gathers 10 petabytes more data. Until recently, DigitalGlobe archived its images on tapes and sent out orders to clients in FTP format or on hard drives by courier--a cumbersome process that required several hours of handling.

The company decided to upload its enormous library on Amazon’s cloud to provide a faster, more competitive service to its customers. Therefore last year, a Snowmobile parked at DigitalGlobe’s headquarters in Colorado, transferring 54 million high-resolution images in just a few weeks and making the entire library of images available online. And, to keep the library always up-to-date, the 80 to 100 terabytes of new data produced every day are transferred online to Amazon S3 on a daily basis.

The company decided to upload its enormous library on Amazon’s cloud to provide a faster, more competitive service to its customers. Therefore last year, a Snowmobile parked at DigitalGlobe’s headquarters in Colorado, transferring 54 million high-resolution images in just a few weeks and making the entire library of images available online. And, to keep the library always up-to-date, the 80 to 100 terabytes of new data produced every day are transferred online to Amazon S3 on a daily basis.

Since then, DigitalGlobe has called on Amazon SageMaker, a machine learning platform, to automatically transfer from S3 to Amazon Glacier any images which, according to an artificial intelligence algorithm, are less likely to be requested. Glacier is a storage service that charges one-fifth of the S3 storage fees; the flip side is that it takes 3 to 5 hours to receive files stored on Glacier. Even so, this service has allowed DigitalGlobe to halve its cloud-related costs without compromising quality of service, because the images that are most likely to be ordered remain immediately available from S3. Using artificial intelligence to spread data over various types of storage with different performance and cost parameters can be a winning strategy for many companies, even for lower data volumes than DigitalGlobe’s. When storing one petabyte of data on S3 costs 293,568 USD per year, halving the bill makes a difference.

The odd thing about AWS’s offering is the yawning gap between Snowball’s 50 TB and Snowmobile’s 100 PB. Let’s hope that sooner or later, a compromise solution might be offered to benefit more companies. Who knows: maybe in the not-so-distant future, driverless cars will be plying the roads between client and data locations? So there still seems to be a role for the asphalt superhighway in the digital economy!

On the road to the cloud!

On the road to the cloud! © iStock.