The explosion in online video streaming is filling up the world’s hard drives, saturating its bandwidth, and possibly heading us towards a crisis of storage and pollution. Could the answer be to more effectively shrink the information itself? And what does machine learning modeling have to do with that?
According to a recent study by IDC, the sheer tonnage of global data stored online is set to increase tenfold from current levels to 163 zettabytes—enough data to fill 1630 billion trucks if printed. IBM reports that we generate 2.5 quintillion bits of data every day, with 90% of the data stored on the internet created in the past two years.
This growing tide of data usage has long-term consequences for the sustainability of global energy networks and for the subsequent environmental impact.
Growing Data Center Pollution in the Age of Consumption
Collectively, data centers are now estimated to have a carbon footprint equivalent to the entire global aviation industry, with a new study characterizing their growing environmental impact as a major concern. Data centers are predicted to consume a fifth of the world’s electricity by 2025, making a significant contribution to global pollution.
Despite notable collective and individual efforts in the green computing space from major players in the network and data center market (including Apple, Google, Facebook, and Amazon), continuing to meet growing demand with growing capacity creates a critical framework; a system operating at such a large scale that small changes inevitably have macroeconomic effects.
Data center optimization and efficient network design have become commercially and politically motivated fields of research—but there simply may not be enough margin for adjustment in data center infrastructure to really change the projected outcomes, given the sheer demand and the rate at which it is growing.
Network Neutrality as a Chilling Effect
The U.S.’s recent repeal of network neutrality laws could have a chilling effect on the unrestrained global data explosion by introducing granular charging and special privilege to those content streams that we now exploit so indiscriminately and at such high volume. However, a business shift of this nature may take time to develop in the United States and would have to pass through a firewall of critical, logistical, and political issues—a process that would need to be repeated if other nations should seek to emulate the new U.S. model.
So we are still left with the problem of either slowing down network demand, building so many new data centers that we risk to saturate available capacity—or shrinking the size of the data itself.
Nowhere is this currently more urgent than in regard to the last ten years’ explosion in online video streaming services. But when online and consumer industries are kept viable through growing consumer expectation of high-bandwidth, high-quality innovations such as 4K, 8K, 360-degree video and bitrate-guzzling virtual and augmented reality systems, where can you make any data saving?
This is where video codec and video compression technologies come in.
Codecs and the Big Picture
High-quality video takes up a lot of space. One hour of uncompressed 4K footage runs at 741gb, whilst the anticipated equivalent size for 8K video is 7.29 terabytes. Without some form of compression, video would be uneconomical to store and practically impossible to play back on standard consumer devices.
Video compression codecs (the word is itself compressed from code/decode) are methods of transcoding these large video files into more manageable forms by discarding information that the viewer—hopefully—won’t notice.
Traditionally, codecs operate by calculating the difference between set video keyframes instead of presenting every single frame of the video in succession. If the video shows a ten-second shot of a person speaking in front of a grey wall, data will need to be spent on making sure that the moving face is accurately represented, since we are very sensitive to facial information. But less data will be needed to show the unchanging grey wall since it has more limited detail (and, from the point of perceptual video optimization) far less narrative focus.
Conventional encoding trades off processing resources against the quality and file size of the encoded video—a factor that assumes greater significance at the global scale of streaming networks such as Netflix, Amazon, and YouTube.
Constant Bit Rate (CBR) encoding guarantees a preset quality with minimal initial computing resources. However, it often results either in larger file sizes, which throw higher amounts of data redundantly at scenes that may not need them; or more moderately-sized encodings, which fail to depict complex events and objects (such as high-speed movement or fog) at an acceptable quality.
Variable Bit Rate (VBR) encoding solves this problem by examining the full-resolution source material at least once before the encoding pass and assigning more data where the image really needs it. The disadvantage of VBR is that it requires at least double the computing resources to encode, and can cause network problems when streaming because of the unexpected variations in data throughput.
The principles behind encoding have changed relatively little since the first digital video codec was proposed for teleconferencing in 1984. And the way that the global multinational streaming companies are addressing the storage and bandwidth crisis is surprisingly conventional, given that the roster includes some of the biggest AI and machine learning players in the world.
One Codec to Rule Them All
Now the world’s largest media company, Netflix has bought a vanguard place in video streaming research, operating some of the most advanced codec and network development projects in the field, as well as working in cooperation with other tech titans in the Netflix-founded Alliance for Open Media.
AOMedia’s primary focus has been the collaborative development of the open video codec AOMedia Video 1 (AV1), which Netflix has committed to rolling out as its new standard video delivery codec by the end of 2018. AV1 is also intended for widespread adoption by the other major streaming companies contributing to the consortium, including Facebook, Google, IBM, Microsoft, Apple, and Amazon.
Tom Watson, Director of Streaming Standards at Netflix, has promised proof this year of the “great benefits” that AV1 will bring to the network, stating “You will see higher quality at a lower bitrate. Our primary interest is compression efficiency.”
It’s not the first time that Netflix has reviewed its codec needs in order to save resources. In 2016, the company re-encoded its entire catalog for more efficient storage and streaming, achieving a 20% saving in the overall catalog footprint. For a company whose online presence represents over 30% of all network traffic, that’s a significant resource reduction—at least until 8K, further diffusion of 4K or continuing uptake negate it again.
Machine Learning for Better Streaming Distribution
AV1 is reported to offer compression savings somewhere between 20% and 30% and promises to break the internet’s dependence on HEVC’s proprietary H.265 codec, in spite of having so much legacy core code that it’s considered to be susceptible to later patent disputes.
AOMedia’s wealth of artificial intelligence talent seems to have played a limited role in the development of AV1. In the case of Netflix, machine learning is chiefly used to optimize network distribution by understanding customer and local network node behavior. A great number of connections are dropped or abandoned because streaming providers can’t anticipate what the demands from any one customer will be in the next 15 minutes, which can have a cumulatively negative impact on a network the size and scope of Netflix.
This does illustrate the extent to which we, as demanding consumers, are part of the problem too. If we were required to book the viewing of a TV episode, online clip or streamed movie by so little as five minutes, we could solve a great deal of the major streaming companies’ logistical problems, and ease the network load. With knowledge of the target device environment (i.e. a smart TV or a mobile phone) and enough time to anticipate and enact suitable Content Delivery Network provisioning, many of the current delivery crises could all but disappear.
However, since this is an unlikely move in the current consumer environment, it is possible that artificial intelligence could step in to help in a more fundamental way than as an arbiter on busy networks.
Compressing Videos with Machine Learning
AOMedia contributor Google is a significant operator in compression and image technologies. Researchers from the company have recently proposed a novel approach to image compression by favoring triangles over traditional blocks in the standard encoding scheme for bitmap data. Though the initial study was limited to the problem of achieving better optimization for thumbnails, the group was able to achieve significant savings over current compression methods, and are extending the research into larger images.
Two years prior, the company released a more ambitious block-based image compression model driven by recurrent neural networks. Google software engineers Nick Johnston and David Minnen stated at the time: “While today’s commonly used codecs perform well, our work shows that using neural networks to compress images results in a compression scheme with higher quality and smaller file sizes.”
Applying neural network capacity to video encoding is a harder task. The motion compensation capabilities of traditional encoding schemes are difficult to encode into neural networks. However, Zhibo Chen, a researcher at the University of Science and Technology of China, achieved promising results in this area this year via a convolutional neural network called VoxelCNN.
The Sharing Solution
Meanwhile, researchers at the Massachusetts Institute of Technology have proposed a full-fledged encoding system built around a Generative Adversarial Network (GAN), a sub-section of artificial intelligence research which pits two neural networks competitively against each other, and which has gained special traction in image-related fields.
The researchers’ neural codec architecture (NCode) achieves what they describe as “orders-of-magnitude improvement in image and video compression.” It works in part by offloading some of the decoding via a local, all-purpose database intended to reside on the end-user device, effectively shunting part of the work to the client-side device.
Sharing a technological workload with the end user is a principle that blockchain development has brought into recent prominence; but in terms of improving the carbon footprint of video streaming, it’s often a false economy. Peer-to-peer torrent streaming only achieves energy efficiency with high seed rates, whilst Bitcoin-derived blockchain schemes (which now include video streaming startups) have been practically designed to burn extra energy, and are doing so at a frightening rate.
To a certain extent, even traditional codec development can end up passing the problem of energy consumption downstream. In the case of AV1, Google is the first of the major AOMedia contributors to commit to AV1 hardware acceleration at the user end. Initially this will take place in the browser-based web environment where AV1 will first be deployed, but eventually, the codec will need to be incorporated into consumer devices such as Smart TVs, either through software updates or appliance upgrades. Inevitably it will have its own energy usage signature to contribute, even discounting the landfill issue of last-gen hardware that can’t run AV1 content well, or at all.
The Legacy Problem
Though AI-driven encoding techniques are promising, the problem they have to solve is too urgent to wait for them to mature. Netflix’s Director of Video Algorithms Anne Aaron acknowledges the way that incumbent technologies tend to win out against superior new solutions simply because they are familiar and institutional:
“Unfortunately, new techniques are evaluated against the state-of-the-art codec, for which the coding tools have been refined from decades of investment. It is then easy to drop the new technology as ‘not at-par.’ Are we missing on better, more effective techniques by not allowing new tools to mature? How many redundant bits can we squeeze out if we simply stay on the paved path and iterate on the same set of encoding tools?”
There is another possibility: no matter how many resources you throw at a popular challenge, sometimes you just come to the end of the rules that define the problem. Barring unforeseen AI-driven solutions, it may be that further significant compression of video content is simply not possible without accepting new compromises in video quality, or in some way changing our behavior and attitudes around the ways that we create and consume video.