Global Data Compression Competition 2021

participants from all over the world

universal and image compressors

About GDCC 2021

This competition focuses on the advantages of algorithms and their implementations for universal lossless data compression rather than for certain data types. We test compressors under the following scenarios:

Test 1:
Qualitative-data compression

Filtered Chinese Wikipedia DB data

Test 2:
Quantitative-data compression

16-bit multispectral images, 16-bit integer and 32-bit floating point telemetry data

Test 3:
Mixed-data compression

Preprocessed ARM64 executable files mixed with scientific data containing 32-bit floating-point numbers and 32-bit integers

Test 4:
Small-block-data compression

Mixture of Test 1 and Test 3 data to be compressed independently in 64 KiB blocks but allowing random-access decompression in 8 KiB blocks

Test 5:
Student test

Participants must generate a parameter file for the provided compressor that minimizes compressed-data size for Test 1 data. Optimize a given compressor using parameter file.

categories

We impose speed limits to separate each of these four tests into three subcategories: rapid compression, balanced compression and high compression ratio (HCR). All told, the result is 12 categories and leaderboards, each with its own prizes.

2021 prize winners

Qualitative data

Quantitative data

Mixed data

Small-block-data

Student

1 place
2 place
3 place
Rapid
Peter Thamm, SummerLOD
Konstantinos Agiannis, fasolada
Reznov, Ijen
Balanced
Peter Thamm, BoganMoth
Xuxiali, fc
Ilya Muravyov, BCM2
High Compression Ratio
Zoltán Gotthardt, paq8px-lite-t1
Marcio Pais, theta
Konstantin Zaborskikh, unicodeCM_T1
1 place
2 place
3 place
Rapid
Andreas Debski, hlx21
James Bonfield, dr4j
Maksym Kovalchuk, STC
Balanced
Marcio Pais, gamma
Konstantin Zaborskikh, imgrcm_T2_B_v0
Frederic Langlet, rex22
High Compression Ratio
Marcio Pais, delta
Kostiantyn Lutsenko, mnk
Konstantin Zaborskikh, tarcmi_T2_H_v0
1 place
2 place
3 place
Rapid
Peter Thamm, Sweet_dreams_ are_made_of_bees
Marcio Pais, eta
Konstantin Zaborskikh, snail_T3_R
Balanced
Marcio Pais, alpha
Konstantin Zaborskikh, T3cm_B
Xuxiali, flz
High Compression Ratio
Marcio Pais, beta
Marcio Pais, alpha Peter Thamm, sao
Zoltán Gotthardt, paq8px-lite-t3
1 place
2 place
3 place
Rapid
Peter Thamm, GGGB
Konstantinos Agiannis, sblock
Christian Schneider, RanzTape
Balanced
Marcio Pais, epsilon
Peter Thamm, FAW
Konstantinos Agiannis, sblock_balanced
High Compression Ratio
Marcio Pais, zeta
Peter Thamm, FAW
Frederic Langlet, librex43
1 place
2 place
3 place
Kostiantyn Lutsenko
Maksym Kovalchuk
Zoltán Gotthardt Christian Schneider

Board of experts of GDCC 2021

photo

Jarek Duda

PhD (computer science, physics) working at Jagiellonian University, author of Asymmetric Numeral Systems (used e.g. in Zstandard, LZFSE, JPEG XL), focused on information theory, statistical modelling, data compression.

photo

Shen Jianqiang

Chief algorithm engineer of Huawei IT Product Line Data Management Algorithm Competence Center, with more than 20 years of research and development experience, currently focusing on storage algorithm research.

photo

Dmitriy Vatolin

A PhD, video-codec developer and coauthor of a book on data compression. Supervisor of collaborative video- and image-processing research projects that include Broadcom, Huawei, Intel, RealNetworks, Samsung and other leading companies. Instructs courses on methods of 3D and 2D video and image processing and compression.

photo

Ilya Papiev

Has more than 10 years of experience in enterprise-storage-systems-related research and development such as data reduction, data tiering. Main research interest is in efficient and high performance lossless data compression.

photo

Alexander Rhatushnyak

A PhD developing data-compression algorithms since the 1990s. Coauthor of a book and patents on data compression, co-creator of the JPEG-XL standard, and multiple-time winner of the Hutter Prize and Calgary Corpus Compression Challenge—the only ongoing competitions (before ours) in lossless data compression.

photo

Eugene Shelwien

Developer of recompression algorithms for Deflate, JPEG, MP3, AAC, proprietary audio codecs and the .pa compression format. Administrator of Encode.su, the biggest international forum covering data-compression algorithms and software.

photo

Jarek Duda

PhD (computer science, physics) working at Jagiellonian University, author of Asymmetric Numeral Systems (used e.g. in Zstandard, LZFSE, JPEG XL), focused on information theory, statistical modelling, data compression.

photo

Shen Jianqiang

Chief algorithm engineer of Huawei IT Product Line Data Management Algorithm Competence Center, with more than 20 years of research and development experience, currently focusing on storage algorithm research.

photo

Dmitriy Vatolin

A PhD, video-codec developer and coauthor of a book on data compression. Supervisor of collaborative video- and image-processing research projects that include Broadcom, Huawei, Intel, RealNetworks, Samsung and other leading companies. Instructs courses on methods of 3D and 2D video and image processing and compression.

photo

Ilya Papiev

Has more than 10 years of experience in enterprise-storage-systems-related research and development such as data reduction, data tiering. Main research interest is in efficient and high performance lossless data compression.

photo

Alexander Rhatushnyak

A PhD developing data-compression algorithms since the 1990s. Coauthor of a book and patents on data compression, co-creator of the JPEG-XL standard, and multiple-time winner of the Hutter Prize and Calgary Corpus Compression Challenge—the only ongoing competitions (before ours) in lossless data compression.

photo

Eugene Shelwien

Developer of recompression algorithms for Deflate, JPEG, MP3, AAC, proprietary audio codecs and the .pa compression format. Administrator of Encode.su, the biggest international forum covering data-compression algorithms and software.

2021 leaderboards

Test 1
Rapid
  • Rapid
  • Balanced
  • High compression ratio
Private

General Notes

Ranking

Table Additional Notes

  • The leaderboard tables below contain results for contest submissions and selected publicly available compressors. The names of submitted compressors appear in boldface.
  • See “Ranking” for rules governing how we order the results.
  • When possible, we set compressor options to use just one thread for publicly available compressors. Some programs, however, may (and did) use multiple threads. Because we declined to fine-tune presets to fit the speed limits as tightly as possible, the compressors are not aligned by speed. Therefore, these results SHOULD NOT be used to draw conclusions about publicly available compressors such as “compressor X is better than compressor Y.”
  • HCR stands for “High Compression Ratio”.

For the “balanced” and “high compression ratio” categories we rank compressors according to the following metric:

c_full_size = compressed-data size + compressed-decompressor size

First place goes to the compressor with the smallest c_full_size.

We compress decompressors using bzip2 v.1.0.8 with the “-9” setting.

For the rapid categories we rank according to the function:

f = c_time + 2·d_time + 1/10⁶·c_full_size,

where c_time and d_time are, respectively, the compression and decompression times in seconds, and c_full_size is in bytes.

First place goes to the compressor with the smallest value for f.

The compressors that fell just short of a given speed category appear at the bottom of the corresponding table. Submissions that failed to fully comply with the rules (in particular, the rule that every compressor must correctly decode the compressed files for all four tests) are also at the bottom.

Charts for leaderboards

General notes

  • The line joining the markers for different compressors on the scatter plot shows the Pareto frontier. That is, for each such compressor, no other analyzed programs in that category achieve better results for both the selected time and compression parameters.
  • The names of submitted compressors appear in boldface.
  • The names of submitted compressors that failed to fully comply with the competition rules appear in strikethrough.
Test 1, Rapid
  • Test 1, Rapid
  • Test 1, Balanced
  • Test 1, HCR
  • Test 2, Rapid
  • Test 2, Balanced
  • Test 2, HCR
  • Test 3, Rapid
  • Test 3, Balanced
  • Test 3, HCR
  • Test 4, Rapid
  • Test 4, Balanced
  • Test 4, HCR
Full time
  • Full time
  • Compression time
  • Decompression time
c_full_size
  • c_full_size
  • c_full_size, megabytes
  • Compression ratio
  • Compression ratio, bits per byte
  • Compression degree

RANKING OF COMPRESSORS