Test sequence set

The video sequences stated below are used in educational purpose only. All samples are taken from openly available sources and open video sets with respect to the creators.

The following sequence set is defined to perform testing:

Scene typeSource
ActionTears of Steal
AnimationNetflix Sol Levante
FilmHarmonic Venice Carnival
Netflix Food Market
Netflix Meridian
Netflix Wind and Nature
NatureHarmonic Birds of Pray
Harmonic Monkey Pool
Harmonic Monkey Fur Closeup
Harmonic Waterfall
Static sceneProArtInc Ruby Beach campfire
TeleconferenceMixKit Coworkers

Video samples:

Tears of Steal, action scene
Netflix Sol Levante, animation
Harmonic Venice carnival
Netflix Food market
Netflix Meridian
Netflix Wind and Nature
Harmonic Birds of Prey
Harmonic Snow monkeys, monkey pool scene
Harmonic Snow monkeys, monkey fur closeup scene
Harmonic Snow monkeys, waterfall scene
ProArtInc Campfire on Ruby Beach, static scene
MixKit Coworkers, teleconference

All input video sequences are to be encoded to the respected bitrates depending on frame size

Frame sizeBitrates
256×144“27K”, “62K”, “97K”, “132K”, “167K”, “203K”, “238K”, “273K”, “308K”, “343K”
412×232“49K”, “111K”, “174K”, “236K”, “299K”, “361K”, “424K”, “486K”, “549K”, “611K”
640×360“95K”, “216K”, “338K”, “459K”, “581K”, “702K”, “824K”, “945K”, “1067K”, “1188K”
852×480“148K”, “337K”, “526K”, “714K”, “903K”, “1092K”, “1281K”, “1469K”, “1658K”, “1847K”
1280×720“260K”, “591K”, “921K”, “1252K”, “1583K”, “1913K”, “2244K”, “2575K”, “2905K”, “3236K”
1920×1080“461K”, “1046K”, “1632K”, “2217K”, “2802K”, “3388K”, “3973K”, “4558K”, “5144K”, “5729K”
2560×1440“671K”, “1523K”, “2374K”, “3226K”, “4078K”, “4929K”, “5781K”, “6633K”, “7484K”, “8336K”
3840×2160“1000K”, “2333K”, “3667K”, “5000K”, “6333K”, “7667K”, “9000K”, “10333K”, “11667K”, “13000K”

Quality metrics

PSNR: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio

SSIM: https://en.wikipedia.org/wiki/Structural_similarity_index_measure

VMAF: https://github.com/Netflix/vmaf

Bitrate range codec comparison

Codec comparison for the bitrate range uses the method of Gisle Bjontegaard: https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc.

The following channels are use for different metrics

MetricChannel
VMAFAll
PSNRAll
SSIMY

What is BDBR?

If you have ever engaged in video coding quality analysis and compared different codecs, you have likely used rate-distortion or RD curves.

RD Curves

Since the X-axis represents bitrate values and the Y-axis represents a quality metric, a higher curve indicates the codec encodes with higher quality at the same bitrate. Visually, the curves are quite close to each other, and one might erroneously conclude that their quality difference is not significant. However, BDBR indicates the codec with the green curve needs to add, on average, 45% more bitrate to achieve the quality of the second one.

When asking the question, “how much more or less bitrate is needed to achieve the quality of the compared codec?”, one needs to draw a horizontal line at a given quality level and observe at which bitrate points it intersects the curves. For this purpose, an “inverted” graph is more suitable, where the X-axis represents quality and the Y-axis represents the required bitrate:

Quality based RD Curves

Now the graph visually aligns with the question. A “higher” curve now indicates the codec lacks sufficient bitrate to match the quality of the second one.

Alright, we have determined that at the point of 90 VMAF, the “red” codec requires 5.15 Mbps, while the green one requires 9.89 Mbps. But this is not enough to understand how much better one codec is than the other. One could calculate the difference at the boundaries and average the values, continuing to add points to average the overall indicator ad infinitum. This essentially boils down to the ratio of the areas under the curves.

To obtain the ratio of the areas under the curves, one must compute the polynomials describing the quality-to-bitrate dependency, determine the boundaries of bitrate intersection for the two codecs, and integrate the function over this interval.

This is precisely what Gisle Bjontegaard proposed in 2001 for comparing codec quality. Technically, 4 points are sufficient for this approach. In my tests, I use 10. On one hand, this is more accurate. On the other, it reduces the probability of outliers between points when finding the polynomial, though not 100%.

This approach has a limitation: the function must be monotonically increasing. On one hand, this complicates calculations and requires checks and special methods to “correct” the function. On the other hand, it serves as an indicator a codec behaves unpredictably. That is, if uniformly increasing the bitrate can cause the codec to lower quality, it raises questions about the reliability of its use.

The resulting value is interpreted directly as the coefficient by which the bitrate must be increased (or decreased) on average to achieve the quality of the compared codec. For example, +45% indicates that the bitrate needs to be increased by 45%. -15% means the bitrate can be reduced by 15% to encode with the same quality.

This is also what promotional articles claim. If AV1 is 30% better than HEVC, it is asserted that one can encode with a 30% lower bitrate while maintaining the same quality as HEVC.

BDBR is a relative metric. Under the hood, any “absolute” metrics can be used, such as VMAF, PSNR, SSIM, etc. It depends on specific tasks and evaluation methodologies. In my tests, I use all three and plan to add SSIMULACRA2 and CIEDE2000 soon for a more complete picture and to detect “hacking” of specific metrics.

BDBR stands for Bjontegaard Delta Bitrate.