Artificial Intelligence and Machine Learning are the great hope for video compression if applications that require hefty data like 8K true freedom to roam Virtual Reality are to come to pass. However, such techniques burn through computer processing and power, so one area of investigation is to make those compute costs more sustainable.
Headway is being made according to an expert in the area, according to Thierry Fautier, VP of video strategy at Harmonic. A first phase will focus on AI/ML techniques using existing codecs such as AVC, HEVC, AV1, and AVS3. A second phase will focus on newer codecs like VVC and AV2.
Explaining progress to the estimable Chris Chinnock of the 8K Association, Fautier says Harmonic has already deployed the first version of such AI-assisted encoding schemes that it calls Content Aware Encoding (CAE).
The idea is to use AI and the mechanics of the human visual system to “continuously assess video quality in real-time and focus bits where and when they matter most for the viewer.” Exactly how the algorithm works remains confidential, but Fautier says their operator customers see up to a 40% bit rate reduction for comparable quality when implementing CAE.
“There are now over 100 CAE deployments worldwide using AVC and HEVC mostly for OTT services,” noted Fautier, “and we have shown it can reduce bit rates for 8K using HEVC during the French Open trial we did in 2019 with France Televisions.”
EXPLORING ARTIFICIAL INTELLIGENCE:
With nearly half of all media and media tech companies incorporating Artificial Intelligence into their operations or product lines, AI and machine learning tools are rapidly transforming content creation, delivery and consumption. Find out what you need to know with these essential insights curated from the NAB Amplify archives:
- This Will Be Your 2032: Quantum Sensors, AI With Feeling, and Life Beyond Glass
- Learn How Data, AI and Automation Will Shape Your Future
- Where Are We With AI and ML in M&E?
- How Creativity and Data Are a Match Made in Hollywood/Heaven
- How to Process the Difference Between AI and Machine Learning
Other AI techniques using existing codecs can be put in two categories: implementations that require a big increase in CPU usage, and techniques like Convoluted Neural Networks (CNN) that are being studied in groups like MPEG.
READ MORE: AI for Encoding Coming in Different Phases (8K Association)
According to Chinnock, the focus with CNN solutions is to re-localize compute power more to the client-side to save bandwidth. Researchers are therefore trying to figure out how to balance the load between AI-based algorithms that run on a neural network vs. the GPU/CPU processing needed for the raw encoding.
“It is important to understand that AI techniques are based on a learning process (supervised or not) where a considerable CPU budget is used,” Chinnock reports. “One must also consider the CPU power used at run time to try to limit its impact when using an AI-based technique. Netflix and some others are using AI to make exhaustive encodes of all the parameter combinations and deduce the best set of resolution-bit rate combinations. This is very accurate but is also very CPU intensive and therefore not applicable to live applications. It is also not very green in terms of carbon footprint or in terms of dollars spent.”
AI Encoding on Existing Codecs
As for directions in AI-assisted encoding being deployed on existing codecs, Fautier says there are three main areas of development: dynamic resolution encoding; dynamic frame rate encoding; and layering.
Dynamic Resolution Encoding (DRE) is an extension to the encoding ladders that OTT content providers use today. With Dynamic frame rate encoding the idea is to encode only at a frame rate that is necessary. That is, talking heads can likely be encoded at 30 fps or lower without loss of detail, whereas live sports will probably need to be encoded at the frame rate at which it is captured. The objective is to reduce the compute load for the encoding process — by up to 30%, depending on the content.
Scalable HEVC, LCEVC, pre/post-processing pairing are all examples of layering. With this approach, you encode a base layer at 4K resolution along with an enhancement layer that conveys the extra 8K details. These two layers may or may not be transmitted over the same transport system. For example, a 4K signal could be broadcast with an enhancement layer sent over an IP connection. If the receiving TV is 4K, it ignores the enhancement data. But if an 8K TV receives these signals, it can use this enhancement data to reconstruct and decode an 8K signal.
Chinnock says that this layering approach can be done today using Scalable HEVC, deployed in the US in ATSC 3.0 with a base layer in HD for mobile and an extension layer for 4K TVs. Scalable VVC and VVC-based LCEVC have been proposed to the TV 3.0 consortium. Also under investigation is the use of LCEVC to create a base layer of legacy HD AVC-encoded content with a UHD extension layer.
One additional challenge with the use of neural networks is in establishing the standards for the interchange of encoding/processing data. MPEG is currently looking into this for its new version of video standards.