Matt Mahoney
Last update: Sept 19, 2024. history
This competition ranks lossless data compression programs by the compressed size (including the size of the decompression program) of the first 109 bytes of the XML text dump of the English version of Wikipedia on Mar. 3, 2006. About the test data.
The goal of this benchmark is not to find the best overall compression program, but to encourage research in artificial intelligence and natural language processing (NLP). A fundamental problem in both NLP and text compression is modeling: the ability to distinguish between high probability strings like recognize speech and low probability strings like reckon eyes peach. Rationale.
This is an open benchmark. Anyone may contribute results. Please read the rules first.
Open source compression improvements to this benchmark with certain hardware restrictions may be eligible for the Hutter Prize.
Compressors are ranked by the compressed size of enwik9 (109 bytes) plus the size of a zip archive containing the decompresser. Options are selected for maximum compression at the cost of speed and memory. Other data in the table does not affect rankings. This benchmark is for informational purposes only. There is no prize money for a top ranking. Notes about the table:
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
- nncp v3.2 14,915,298 106,632,363 628,955 xd 107,261,318 241871 238670 7600 Tr 88
- cmix v21 -t 14,623,723 107,963,380 281,387 sd 108,244,767 622949 638442 30950 CM 83
- fx-cmix 112,142,259 0 xd 112,142,259 216836 8869 CM 97
- tensorflow-compress v4 15,905,037 113,542,413 55,283 sd 113,597,696 291394 290803 45360 LSTM 94
- cmix-hp 10 Jun 2021 15,957,339 113,712,798 0 xd 113,712,798 189420 194280 6873 CM 89
- fast-cmix 113,746,218 0 xd 113,746,218 121971 8027 CM 99
- starlit 31 May 2021 15,215,107 114,951,433 0 xd 114,951,433 173953 171682 10233 CM 89
- phda9 1.8 15,010,414 116,544,849 42,944 xd 116,587,793 86182 86305 6319 CM 83
- paq8px_v206fix1 -12L 15,849,084 124,696,410 402,949 s 125,099,359 291916 294847 28151 CM 93
- durilca'kingsize -m13000 -o40 -t2 16,209,167 127,377,411 407,477 xd 127,784,888 1398 1797 13000 PPM 31
- cmve 0.2.0 -m2,3,0x7fed7dfd 16,424,248 129,876,858 307,787 x 130,184,645 1140801 19963 CM 81
- paq8hp12any -8 16,230,028 132,045,026 330,700 x 132,375,726 37660 37584 1850 CM 41
- drt|emma 1.23 16,523,517 134,164,521 1,358,251 xd 135,522,772 73006 67097 3800 CM 81
- zpaq 6.42 -m s10.0.5fmax6 17,855,729 142,252,605 4,760 sd 142,257,365 6699 14739 14000 CM 61
- drt|lpaq9m 9 17,964,751 143,943,759 110,579 x 144,054,338 868 898 1542 CM 41
- mcm 0.83 -x11 18,233,295 144,854,575 79,574 s 144,934,149 394 281 5961 CM 72
- nanozip 0.09a -cc -m32g -p1 -t1 -nm 18,594,163 148,545,179 783,642 x 149,328,821 1149 1141 32000 CM 74
- xwrt 3.2 -l14 -b255 -m96 -s -e40000 -f200 18,679,742 151,171,364 52,569 s 151,223,933 2537 2328 1691 CM
- fp8 v3 -8 18,438,169 153,188,176 50,068 s 153,238,244 20605 22593 1192 CM 26
- WinRK 3.03 pwcm +td 800MB SFX 18,612,453 156,291,924 99,665 xd 156,391,589 68555 800 CM 10
- ppmonstr J -m1700 -o16 19,055,092 157,007,383 42,019 x 157,049,402 3574 ~3600 1700 PPM
- zcm 0.93 -m8 -t1 19,572,089 159,135,549 227,659 x 159,363,208 421 411 3100 CM 48
- slim 23d -m1700 -o12 19,077,276 159,772,839 69,453 x 159,842,292 5232 ~5400 1700 PPM
- bsc-m03 0.4.0 -b1000000000 20,293,393 160,258,936 105,456 xd 160,364,392 160 135 13000 BWT 96
- bwmonstr 0.02 20,307,295 160,468,597 69,401 x 160,537,998 331801 156147 590 BWT 30
- nanozipltcb 0.09 20,537,902 161,581,290 133,784 x 161,715,074 64 30 3350 BWT 40
- M03 1.1b 1000000000 20,710,197 163,667,431 50,468 x 163,717,899 457 406 5735 BWT 52
- bcm 2.03 -b1000x- 20,738,630 163,646,387 125,866 x 163,772,253 63 34 4096 BWT 98
- glza 0.10.1 -x -p3 20,356,097 163,768,203 69,935 s 163,838,138 8184 11.9 8205 Dict 67
- bsc 3.25 -b1000 -e2 20,786,794 163,884,462 74,297 xd 163,958,759 23 8 5000 BWT 96
- bbb m1000 20,847,290 164,032,650 11,227 s 164,043,877 4524 2619 1401 BWT
- pcompress 3.1 -c libbsc -l14 -s1000m 20,769,968 163,391,884 1,370,611 x 164,762,495 359 74 3300 BWT 48
- paq9a -9 19,974,112 165,193,368 13,749 s 165,207,117 3997 4021 1585 CM
- uda 0.300 19,393,460 166,272,261 11,264 x 166,283,525 25282 25174 180 CM
- BWTmix v1 c10000 20,608,793 167,852,106 9,565 x 167,861,671 1794 690 5000 BWT 49
- lrzip 0.612 -z -L 9 -p 1 19,847,690 169,318,794 99,363 x 169,418,157 2987 2929 2700 CM 33
- cm4_ext 20,188,048 170,566,799 204,782 x 170,771,581 4123 4130 1906 CM 26
- M1x2 v0.6 7 enwik7.txt 20,723,056 172,212,773 38,467 s 172,251,240 711 715 1051 CM 26
- cmm4 v0.1e 96 20,569,034 172,669,955 31,314 x 172,701,269 2052 2056 1321 CM
- lstm-compress v3 20,318,653 173,874,407 144,567 s 174,018,974 92342 91876 9 LSTM 83
- ccmx 1.30 7 20,857,925 174,142,092 15,014 x 174,157,106 1313 1338 1332 CM
- bit 0.7 -p=5 20,823,204 174,425,039 62,493 x 174,487,532 2050 2100 663 CM 26
- mcomp 2.00 -mw -M320m 21,103,670 174,388,351 172,531 x 174,560,882 473 399 1643 BWT 26
- epmopt|epm r9 -m800 -n20 --fixedorder:12 19,713,502 174,817,424 141,101 x 174,958,525 3179 3376 800 PPM
- WinUDA 2.91 mode 3 (194 MB) 20,332,366 174,975,730 17,203 x 174,992,933 23610 23473 194 CM
- dark 0.51 -b333mf 21,169,819 175,471,417 34,797 x 175,506,214 533 453 1692 BWT
- FreeArc 0.40pre-4 -mppmd:1012m:o13:r1 20,931,605 175,254,732 748,202 x 176,002,934 1175 1216 1046 PPM
- hook v1.4 1700 21,990,502 176,648,663 37,004 x 176,685,667 741 695 1777 DMC 26
- 7zip 4.46a -m0=ppmd:mem=1630m:o=10 ... 21,197,559 178,965,454 0 xd 178,965,454 503 546 1630 PPM 23
- rings 2.5 -m8 -t1 20,873,959 178,747,360 240,523 x 178,987,883 280 163 2518 BWT 48
- pimple2 20,871,457 180,251,530 78,642 x 180,330,172 18474 17992 128 CM
- ash 04a /m700 /o10 19,963,105 180,735,542 11,137 x 180,746,679 6100 5853 700 CM
- bce3 22,729,148 180,732,702 19,889 s 180,752,591 1151 2444 5000 CM 71
- ocamyd LTCB 1.0 -s0 -m3 21,285,121 182,359,986 21,030 x 182,381,016 108960~110000 300 DMC 6
- bee 0.79 b0154 -m3 -d8 20,975,994 182,373,904 57,046 x 182,430,950 9295 9285 512 PPM
- uhbc 1.0 -m3 -b100m 20,930,838 182,918,172 56,242 x 182,974,414 1569 809 800 BWT
- smac 1.20 21,781,544 183,190,888 4,356 x 183,195,244 4249 4399 1542 CM 26
- ppmd J1 -m256 -o10 -r1 21,388,296 183,964,915 11,099 s 183,976,014 880 895 256 PPM
- tc 5.2 dev 2 21,481,399 184,939,711 41,112 x 184,980,823 3637 3655 230 CM
- bwtsdc v1 23,414,955 185,709,858 8,421 s 185,718,279 2100 420 5213 BWT 47
- fbc v1.1 333333334 22,554,133 185,975,548 23,576 x 185,999,124 451 415 1647 BWT 55
- ppmvc v1.1 -m256 -o8 -r1 21,484,294 186,208,405 25,241 x 186,233,646 898 913 272 PPM
- chile 0.4 -b=244141 22,218,917 186,979,614 11,530 s 186,991,144 2513 512 1426 BWT
- bwtdisk 0.9.0 -b 2 -m 3500 24,725,277 190,004,306 169,579 s 190,173,885 1124 3500 BWT 48
- CTXf 0.75 pre b1 -me 22,072,783 191,008,871 57,337 x 191,066,298 1112 1037 78 PPM
- m03exp 2005-02-15 32MB blocks 21,948,192 191,250,500 44,593 x 191,295,093 ~4800 ~2100 256 BWT
- Stuffit 12.0.0.17 -m=4 -l=16 -x=30 22,105,654 190,372,707 2,658,122 xd 193,030,829 628 658 1062 PPM
- plzma v3b c2 ... (see below) 24,206,571 193,240,160 101,221 x 193,341,381 8889 55 10110 LZ77 58
- crook v0.1 -m1600 -O8 22,503,627 193,333,159 8,539 s 193,341,698 483 513 1641 PPM 26
- ppmx 0.03 22,572,808 193,643,464 54,964 x 193,698,428 777 784 609 PPM 26
- lzturbo 1.1 -49 -b1000 -p0 24,416,777 194,681,713 110,670 x 194,792,383 1920 9 14700 LZ77 59
- enc 0.15 aq 22,156,982 195,604,166 94,888 x 195,699,054 6843 6868 50 CM
- comprolz 0.11.0-bugfix1 -b250 -f 22,813,215 196,651,379 29,453 x 196,680,832 984 308 688 ROLZ 26
- sbc 0.970r2 -ad -m3 -b63 22,470,539 197,066,203 99,094 xd 197,165,297 1733 313 224 BWT
- xz 5.2.1--lzma2=preset=9e,dict=1GiB,lc=4,pb=0 24,703,772 197,331,816 36,752 xd 197,368,568 5876 20 6000 LZ77 73
- WinRAR 3.60b3 -mc7:128t+ -sfxWinCon.sfx 22,713,569 198,454,545 0 xd 198,454,545 506 415 128 PPM
- quark v0.95r beta -m1 -d25 -l8 22,988,924 198,600,023 80,264 x 198,680,287 27952 217 534 LZ77
- lzip 1.14-rc3 -9 -s512MiB 24,756,063 199,410,543 21,682 s 199,432,225 2409 21 5632 LZ77 57
- comprox 0.11.0-bugfix1 -b250 -f -m100 23,064,386 199,515,912 34,176 x 199,550,088 917 153 688 LZ77 26
- bssc 0.95 alpha -b16383 23,117,061 201,810,709 45,489 x 201,856,198 578 217 140 BWT 4
- flashzip 1.0.0 -mx7 -b7 23,869,034 202,363,445 123,053 x 202,486,498 1296 122 802 ROLZ 26
- lzham 1.0 -d29 -x 25,002,070 202,237,199 191,600 s 202,428,799 1096 6.6 7800 LZ77 70
- csarc 3.3 -m5 -d1024m 24,516,202 203,995,005 69,848 s 204,064,853 621 22 2463 LZ77 48
- packet 1.9 -mx -b512 -h8 24,968,492 204,195,438 261,967 x 204,457,405 974 14 2824 LZ77 48
- uharc 0.6b -mx -md32768 23,911,123 208,026,696 73,608 xd 208,100,304 1666 1330 50 PPM
- TarsaLZP Jan 29 2012 24,751,389 208,867,187 13,081 s 208,880,268 203 ~2000 LZP 54
- GRZipII 0.2.4 -b8m 23,846,878 208,993,966 41,645 s 209,035,641 312 216 58 BWT
- 4x4 0.2a 4t (grzip:m1:h18) 23,833,244 208,787,642 317,097 x 209,104,739 386 240 269 BWT
- rzm 0.07h 24,361,070 210,126,103 17,667 x 210,143,770 2336 81 160 ROLZ
- pim 2.50 best 24,303,638 210,124,895 330,901 x 210,455,796 764 ~764 88 PPM
- CTW 0.1 -d6 -n16M -f16M 23,670,293 211,995,206 43,247 x 212,038,452 19221 19524 144 CM
- boa 0.58b -m15 24,322,643 213,845,481 55,813 x 213,901,294 3953 ~4100 17 PPM
- yxz 0.11 -m9 -b7 -h6 25,754,856 214,317,684 131,062 x 214,448,746 642 77 1590 LZ 26
- zstd 0.6.0 -22 --ultra 25,405,601 215,674,670 69,687 s 215,744,357 701 2.2 792 LZ77 76
- tornado 0.6 -16 25,768,105 217,749,028 83,694 s 217,832,722 1482 9 1290 LZ77 48
- LZPXj 1.2h 9 25,205,783 217,880,584 4,853 s 217,885,437 783 717 1316 PPM
- scmppm 0.93.3 -l 9 25,198,832 217,867,392 37,043 s 217,904,435 708 644 20 PPM
- acb 2.00c u 25,063,656 218,473,968 38,976 x 218,512,944 10656 10883 16 LZ77 26
- crushm 25,013,576 218,656,416 30,097 x 218,686,513 617 649 39 CM 26
- PX v1.0 24,971,871 219,091,398 3,054 s 219,094,452 1838 1809 66 CM 3
- DGCA 1.10 default+SFX 25,203,248 219,655,072 0 xd 219,655,072 858 270 76
- Squeez 5.20.4600 sqx2.0 32MB Ultra 25,118,441 220,004,873 91,019 xd 220,095,892 2575 116 365
- fpaq2 25,287,775 221,242,386 3,429 s 221,245,815 20183 20186 131 CM
- TinyCM 0.1 9 25,913,605 221,773,542 12,553 x 221,786,095 1342 1330 1083 CM 26
- dmc c 1800000000 25,320,517 222,605,607 2,220 s 222,607,827 676 721 1800 DMC
- lza 0.82b -mx9 -b7 -h7 26,396,613 222,808,457 285,766 x 223,094,223 449 9.7 2000 LZ77 48
- brotli 18-Feb-2016 -q 11 -w 24 25,764,698 223,597,884 542,385 s 224,140,269 3400 5.9 437 LZ77 48
- szip 1.12a -b41o16 26,120,472 227,586,463 31,708 x 227,618,171 1191 289 21 BWT 26
- balz 1.13 ex 26,421,416 228,337,644 49,024 x 228,286,668 3700 190 206 ROLZ
- lzpm 0.11 9 26,501,542 229,083,971 46,824 x 229,130,795 15395 57 740 ROLZ
- qazar 0.0pre5 -l7 -d9 -x7 26,455,170 229,846,871 71,959 x 229,918,830 5738 903 105 LZP
- KuaiZip 2.3.2 x86 25,895,915 227,905,650 3,857,649 x 231,763,299 1061 47 197 LZ77 26
- qc 0.050 -8 26,763,343 232,784,501 46,100 x 232,830,601 8218 1503 151
- ppms J -o5 26,310,248 233,442,414 16,467 x 233,458,881 330 354 1.8 PPM
- dzo beta 26,616,115 235,056,859 618,883 x 235,675,742 1088 159 200 LZ77 26
- comprox_ba 20110929 27,828,189 242,846,243 4,134 s 242,850,377 397 101 226 BWTS 48
- WinTurtle 1.60 512 MB buffer 28,379,612 245,217,944 160,090 x 245,378,034 273 237 583 PPM
- diz 26,545,256 246,679,382 12,945 s 246,692,327 21240 22746 1350 PPM 26
- cabarc 1.00.0601 -m lzx:21 28,465,607 250,756,595 51,917 xd 250,808,853 1619 15 20 LZ77
- sr3 28,926,691 253,031,980 9,399 s 253,054,625 148 160 68 SR 26
- bzip2 1.0.2 -9 29,008,736 253,977,839 30,036 x 254,007,875 379 129 8 BWT
- rh5_x64 -window:27 c6 29,078,552 254,220,469 36,744 x 254,257,213 196 9.4 145 ROLZ 48
- RangeCoderC v1.7 c7 26 28,788,013 254,527,369 7,858 x 254,535,227 2460 2436 1116 CM 26
- quad v1.11 -x 29,110,579 256,145,858 13,387 s 256,159,245 956 116 34 ROLZ
- WinACE -sfx -m5 -d4096 29,481,470 257,237,710 0 xd 257,237,710 1080 77 4
- lzsr 0.01 29,433,834 258,912,605 40,287 x 258,952,892 194 88 6 LZ77 26
- libzling 20160107 e4 29,721,114 259,475,639 35,582 s 259,511,221 83 27 28 ROLZ 48
- xpv5 c2 29,963,217 262,525,246 14,371 x 262,539,617 2359 516 9 ROLZ 26
- sr3c 1.0 29,731,019 266,035,006 7,701 x 266,042,707 160 145 5 SR 26
- lzc v0.08 10 30,611,315 266,565,255 11,364 x 266,576,619 302 63 550 LZ77
- nakamichi 2019-Jul-01 32,917,888 277,293,058 112,899 s 277,405,957 8200000 1.3 302000 LZSS 85
- crush 1.00 cx 31,731,711 279,491,430 2,489 s 279,493,919 948 2.9 148 LZ77 60
- xeloz 0.3.5.3 c889 32,441,272 283,621,211 18,771 s 283,639,982 1079 8 230 LZ77 48
- bzp 0.2 31,563,865 283,908,295 36,808 x 283,945,103 110 120 3 LZP
- lzwg -27 34,423,369 284,356,322 19,828 xd 284,376,150 135 41 1744 LZW 95
- ha 0.98 a2 31,250,524 285,739,328 28,404 x 285,767,732 2010 1800 0.8 PPM
- ulz 0.06 c9 32,945,292 291,028,084 49,450 x 291,077,534 325 1.1 490 LZ77 82
- irolz 33,310,676 292,448,365 4,584 s 292,452,949 274 144 17 ROLZ 26
- lcssr 0.2 -b7 -l9 34,549,048 296,160,661 8,802 x 296,169,463 8186 8281 1184 SR
- zlite 33,975,840 298,470,807 4,880 s 298,475,687 61 28 36 ROLZ 26
- lazy 1.00 5 35,024,082 306,245,949 5,986 s 306,251,935 273 24 96 LZ77 26
- zhuff 0.97 beta -c2 34,907,478 308,530,122 63,209 x 308,593,331 24 3.5 32 LZ77 48
- lzhhf 34,848,933 308,825,079 24,576 xd 308,849,655 392 12 14 LZ77 95
- slug 1.27 35,093,954 309,201,454 6,809 x 309,208,263 32 28 14 ROLZ
- lzuf62 34,960,889 309,837,920 24,576 xd 309,862,496 375 11 14 LZ77 95
- pigz 2.3 -11 35,002,893 309,812,953 52,717 s 309,865,670 2237 13 25 LZ77 48
- kzip May 13 2006 /b1024 35,016,649 310,188,783 29,184 xd 310,217,967 6063 62 121 LZ77 2
- uc2 rev 3 pro -tst 35,384,822 312,767,652 123,031 x 312,890,683 360 63 4 LZ77
- thor 0.95 e4 35,795,184 314,092,324 49,925 x 314,142,249 64 34 16 LZP
- etincelle a3 35,776,971 314,801,710 44,103 x 314,845,813 29 18 976 ROLZ 26
- lz5 1.3.3 -18 36,514,408 319,510,433 138,210 s 319,648,643 10578 3.7 1139 LZ77 48
- gzip124hack 1.2.4 -9 36,273,716 321,050,648 62,653 x 321,113,301 149 19 1 LZ77
- doboz 0.1 36,367,430 322,415,409 83,591 x 322,499,000 533 3.4 1200 LZ77 48
- gzip 1.3.5 -9 36,445,248 322,591,995 38,801 x 322,630,796 101 17 1.6 LZ77
- Info-ZIP 2.3.1 -9 36,445,373 322,592,120 57,583 x 322,649,703 104 35 0.1 LZ77
- pkzip 2.0.4 -ex 36,556,552 323,403,526 29,184 xd 323,432,710 171 50 2.5 LZ77
- jar (Java) 0.98-gcc cvfM 36,520,144 323,747,582 19,054 x 323,766,636 118 95 1.2 LZ77
- PeaZip better, no integrity check 36,580,548 323,884,274 561,079 x 324,445,353 243 243 8 LZ77 20
- arj 3.10 -m1 37,091,317 328,553,982 143,956 x 328,697,938 262 67 3 LZ77 26
- lzgt3a 37,444,440 334,405,713 4,387 xd 334,410,100 1581 2886 2 LZ77
- pucrunch -d -c0 39,199,165 350,265,471 34,359 s 350,299,830 2649 463 2 LZ77
- packARC v0.7RC11 -sfx -np 38,375,065 361,905,425 0 xd 361,905,425 1359 1486 23 CM
- urban 38,215,763 362,677,440 4,280 s 362,681,720 381 450 6 o2 48
- lzop v1.01 -9 41,217,688 366,349,786 54,438 x 366,404,224 289 12 1.8 LZ77
- lzw 0.2 41,960,994 367,633,910 671 s 367,634,581 3597 31 18 LZW
- MTCompressor v1.0 41,295,546 370,152,396 3,620 x 370,156,016 173 117 74 LZ77 26
- lz4x 1.02 c4 41,950,112 372,068,437 48,609 x 372,117,046 79 1.4 114 LZ77 68
- arbc2z 38,756,037 379,054,068 6,255 sd 379,060,323 2659 2674 68 PPM
- lz4 v1.2 -c2 42,870,164 379,999,522 49,128 x 380,048,650 91 6 20 LZ77 26
- lzss 0.02 cx 42,874,387 380,192,378 48,114 x 380,240,492 107 2.3 145 LZSS 63
- xdelta 3.0u -9 44,288,463 389,302,725 107,985 x 389,410,710 1021 30 47 LZ77
- brieflz 1.1.0 43,300,800 390,122,722 14,907 s 390,137,629 21 7.5 3 LZ77 48
- mtari 0.2 41,655,528 397,232,608 4,156 s 397,236,764 80 99 18 CM 26
- lzf 1.02 cx 45,198,298 406,805,983 48,359 x 406,854,342 68 2.2 151 LZ77 68
- srank 1.1 -C8 43,091,439 409,217,739 6,546 x 409,224,285 51 45 2 SR
- QuickLZ 1.30b (quick3) 46,378,438 410,633,262 44,202 x 410,677,464 48 12 3 LZ77
- stz 0.7.2 -c2 47,192,312 416,524,596 41,941 x 416,566,537 14 13 3 LZ77 26
- compress 4.3d 45,763,941 424,588,663 16,473 x 424,605,136 103 70 1.8 LZW
- lzrw3-a 48,009,194 438,253,704 4,750 x 438,258,454 38 17 2 LZ77
- fcm1 45,402,225 447,305,681 1,116 s 447,306,797 228 261 1 CM1
- runcoder1 46,883,939 458,125,932 5,488 s 458,131,420 140 156 4 o1 26
- data-shrinker 23Mar2012 51,658,517 459,825,318 3,706 s 459,829,024 14 4 2 LZ77 26
- lzwc_bitwise 0.7 46,639,414 463,884,550 4,183 x 463,888,733 123 134 71 LZW 26
- exdupe 0.3.3 53,717,422 478,788,378 1,092,986 x 479,881,364 27 5 1000 LZ77 48
- lzv 0.1.0 54,950,847 488,436,027 10,385 x 488,446,412 4 2.6 3 LZ77 48
- FastLZ Jun 12 2007 54,658,924 493,066,558 7,065 xd 493,073,623 18 13 1 LZ77
- sharc 0.9.11b -c2 53,175,042 494,421,068 81,001 s 494,502,069 15 14 6 LZP 26
- flzp v1 57,366,279 497,535,428 3,942 s 497,539,370 78 38 8 LZP
- alba 0.5.1 cd 52,728,620 515,760,096 4,870 s 515,764,966 239 10 4 BPE 48
- lzpgt6 56,113,248 522,877,083 27,136 x 522,904,219 6 5 6 LZP 95
- snappy 1.0.1 58,350,605 527,772,054 23,844 s 527,795,898 25 12 0.1 LZ77 26
- bpe 5000 4096 200 3 53,906,667 532,250,688 1,037 sd 532,251,725 639 28 0.5 Dict 26
- kwc 54,097,740 532,622,518 15,186 x 532,637,704 438 145 668 Dict 26
- bpe2 v3 55,289,197 542,748,980 2,979 s 542,751,959 518 132 0.5 Dict 26
- fpaq0f2 56,916,872 558,645,708 3,066 x 558,648,769 222 207 0.4 o0
- ghost 456 5 55,357,196 568,004,779 696 sd 568,005,475 172800 245 88000 Dict 100
- ppp 61,657,971 579,352,307 1,472 s 579,353,779 80 59 1 SR
- ksc 4 59,511,259 580,557,413 13,507 x 580,570,920 40050 7917 1700 SR 48
- lzbw1 0.8 67,620,436 590,235,688 21,751 x 590,257,439 15 12 55 LZP 26
- lzp2 0.7c 67,909,076 598,076,882 40,819 x 598,117,701 11 8 15 LZP 26
- NTFS LZNT1 76,955,648 636,870,656 0 636,870,656 10 9 0.1 LZ77 26
- shindlet_fs 62,890,267 637,390,277 1,275 xd 637,391,552 113 103 0.6 o0
- arb255 63,501,996 644,561,595 4,871 sd 644,566,466 2551 2574 1.6 o0
- compact 63,862,371 648,370,029 3,600 sd 648,373,629 216 164 0.2 o0
- TinyLZP 0.1 79,220,546 694,274,932 2,811 s 694,277,743 32 38 10 LZP 26
- smile 71,154,788 695,562,502 207 xd 695,562,709 10517 10414 0.6 MTF 26
- barf (2 passes) 76,074,327 758,482,743 983,782 s 759,466,525 756 53 4 LZ77
- arb2x v20060602 99,642,909 995,674,993 3,433 sd 995,678,426 2616 2464 1.6 o0b
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- hipp 5819 /o8 20,555,951 (fails) 36,724 x 5570 5670 719 CM ppmz2 23,557,867 (fails) 29,362 s 92210 88070 1497 PPM 26 XMill 0.8 -w -P -9 -m800 26,579,004 (fails) 114,764 xd 616 530 800 PPM lzp3o2 33,041,439 (fails) 23,427 xd 230 270 151 LZP
Programs that properly decompress enwik9 and don't use external dictionaries are still eligible for the Hutter Prize.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- rdmc 0.06b 33,181,612 1394 1381 DMC 6 ESP v1.92 36,651,292 223 LZ77 16
Pareto frontier: compressed size vs. compression time as of Aug. 18, 2008 from the main table (options for maximum compression).
Pareto frontier: compressed size vs. memory as of Aug. 18, 2008
(options for maximum compression).
I only test the latest supported version of a program. I attempt to find the
options that select the best compression, but will not generally do an exhausitve
search. If an option advertises maximum compression or memory, I don't try the alternatives.
If you know of a better combination, please let me know.
I will select the maximum memory setting that does not cause disk thrashing, usually about 1800 MB.
If the compressor is not downloadable as a zip file then I will compress the source or
executable (whichever archive is smaller) plus any other needed files (dictionaries) into a single zip
archive using 7zip 4.32 -tzip -mx=9.
If no executable is available I will attempt to compile in C or C++
(MinGW 3.4.2, Borland 5.5 or Digital Mars), Java 1.5.0, MASM, NASM, or gas.
1. Reported by Guillermo Gabrielli, May 16, 2006. Timed on a Celeron D325 2.53Ghz Windows XP SP2 256MB RAM. I have not verified results submitted by others. Timing information, when available,
may vary widely depending on the test machine used.
The numbers in the headings are the compression ratios on enwik9.
Version 2019-11-16 was released Nov. 16, 2019. It was run in 8 threads.
Version 2 was released Jan. 3, 2021. It uses a
transformer
architecture, a recurrent neural network with attention mechanism to allow parallelism.
The algorithm
is described briefly here.
It uses the same dictionary preprocessing as earlier versions.
It was tested with an
Intel Xeon E3-1230 v6 at 3.5 GHz and a
Geforce RTX 3090 GPU with 10,496 Cuda cores and 24 GB RAM.
nncp v2.1 was released Feb. 6, 2021. It is the same code as v2
except for a larger model and slightly different hyperparameters.
nncp v3 was released Apr. 24, 2021.
This new version is coded in C and supports recent NVIDIA GPUs. It is
much faster (3x) due to algorithmic improvements and requires less
memory. The Transformer model is similar (199M parameters) but the
hyperparameters have been tuned.
nncp v3.1 was released June 1, 2021.
nncp v3.2 was released Oct. 23, 2023.
cmix v2 was released May 29, 2014.
cmix v3 was released June 27, 2014.
cmix v4 was released July 22, 2014. It uses 28,976,428 KiB memory (29.7 GB).
cmix v5 was released Aug. 13, 2014. The decompressor size is a zip
archive containing the source code, makefile, and a dictionary
compressed with cmix from 465211 to 90065 bytes.
cmix v6 was released Sept. 3, 2014. The decompressor size includes
the dictionary compressed with cmix from 465211 to 90207 bytes.
cmix v7 was released Feb. 4, 2015.
cmix v8 was released Nov. 10, 2015.
cmix v9 was released Apr. 8, 2016.
cmix v10 was released June 17, 2016.
cmix v11 was released July 3, 2016. It incorporates a modification
originally developed by Eugene Shelwien in which PPMd is included as a model.
cmix v12 was released Nov. 7, 2016. It includes a LSTM model.
cmix v13 was released Apr. 24, 2017.
cmix v14 was released Nov. 22, 2017.
cmix v15 was released May 19, 2018.
cmix v16 was released Oct 6, 2018.
cmix v17 was released Mar. 24, 2019.
cmix v18 was released Aug. 2, 2019.
cmix v19 was released Aug. 29, 2021. It has improvements based on the startlit
(article reordering) and
cmix-hp Hutter prize entries. It has a separate decompressor.
cmix v20 was released Nov. 5, 2023.
cmix v21 was released Sept. 17, 2024.
Notes about compressors
2. Decompression size and time for pkzip 2.0.4. kzip only compresses.
3. Reported by Ilia Muraviev (author of PX, TC, pimple), June 10-July 18, 2006. Timed on a P4 3.0 GHz, 1GB RAM, WinXP SP2.
4. enwik9 reported by Johan de Bock, May 19, 2006. Timed on Intel Pentium-4 2.8 GHz 512KB L2-cache, 1024MB DDR-SDRAM.
5. Compressed with paq8h (VC++ compile) and decompressed with paq-8h (Intel compile of same source code).
Normally compression and decompression are the same speed.
6. ocamyd 1.65.final and LTCB 1.0 reported by Mauro Vezzosi, May 30-June 20, 2006.
Timed on a 1.91 GHz AMD Athlon XP 2600+, 512 MB, WinXP Pro 2002 SP2
using timer 3.01. ocamyd 1.66.final reported Feb. 3, 2007.
Times are process times.
7. Under development by Mauro Vezzosi, May 24, 2006.
8. Reported by Denis Kyznetsov (author of qazar), June 2, 2006.
9. Reported by sportman, May 24, 2006. Timed on a Intel Pentium D 830 dual core 3.0GHz,
2 x 512MB DDR2-SDRAM PC4300 533Mhz memory timing 4-4-4-12 (833.000KB free),
Windows XP Home SP2. CPU was at 52% so apparently only one of 2 cores was used.
Decompression verified on enwik8 only (not timed, about 2.5 hours).
WinRK compression options: Model size 800MB,
Audio model order: 255,
Bit-stream model order: 27,
Use text dictionary: Enabled,
Fast analyses: Disabled,
Fast executable code compression: Disabled
10. Reported by Malcolm Taylor (author of WinRK), May 24, 2006.
Timed on an Athlon X2 4400+ with 2GB, running WinXP 64. Decompression not tested.
decompresser size is based on SFX stub size reported by Artyom (A.A.Z.), Sept. 2, 2007,
although it was not tested this way.
11. Reported by sportman, May 25, 2006. CPU as in note 9.
12. Reported by sportman, May 30, 2006. CPU as in 9 (50% utilized).
13. xwrt 3.2 options are -2 -b255 -m250 -s -f64. ppmonstr J options are -o10 -m1650.
14. Reported by Michael A Maniscalco, June 15, 2006.
15. Reported by Jeremiah Gilbert on the Hutter group, Aug. 18, 2006. Tested under Linux on a dual Xeon
1.6 GHz(lv) (overclocked to 2.13 GHz) with 2 GB memory. Time is user+sys (real=196500 B/ns).
16. Reported by Anthony Williams, Aug. 19-22. 2006. Timed on a 2.53 GHz Pentium 4 with 512 MB under WinXP Home SP2.
17. Tested Aug. 20, 2006 under Ubuntu Linux 2.6.15 on a 2.2 GHz Athlon-64 with 2 GB memory. Time is approximate
wall time due to disk thrashing. User+sys time is 153600 ns/byte compress, 148650 decompress.
18. Reported by Dmitry Shkarin (author of durilca4linux), Aug. 22-23, 2006 for durilca4linux_1;
and Oct. 16-18, 2006 for durilca4linux_2. 3 GB memory usage is RAM + swap.
Tested on AMD Athlon X2 4400+, 2.22 GHz, 2 GB memory under SuSE Linux AMD64 v10.0.
durilca4linux_3 reported Feb. 21, 2008 using 4 GB RAM + 1 GB swap. v2 reported Apr. 22, 2008.
v3 reported May 22, 2008.
19. enwik8 confirmed by sportman, Sept. 20, 2006. Compression time 61480 ns/byte timed on a
2 x dual core (only one core active) Intel Woodcrest 2GHz with 1333MHz fsb and 4GB 667MHz CL5 memory under
SiSoftware Sandra Lite 2007.SP1 (10.105). Drystone ALU 37,014 MIPS, Whetstone iSSE3 25,393 MFLOPS,
Integer x8 iSSE4 220,008 it/s, Floating-point x4 iSSE2 119,227 it/s.
20. Reported by Giorgio Tani (author of PeaZip) on Nov. 10, 2006. Tested on a MacBook Pro,
Intel T2500 Core Duo CPU (one core used),
with 512 MB memory under WinXP SP2. Time is combined compression and decompression.
21. enwik9 -8 reported by sportman, Dec. 12-13, 2006. Hardware as note 19. enwik9
decompression not verified. paq8hp7 -8 enwik8 compression was reported as 16,417,650
(4 bytes longer; the size depends on the length of the input filename, which was
enwik8.txt rather than enwik8).
I verified enwik8 -7 and -8 decompression.
22. paq8hp8 -8 enwik9 reported by sportman, Jan. 18, 2007.
paq8hp10 -8 enwik9 on Apr. 2, 2007. paq8hp11 -8 enwik9 on May 10, 2007.
paq8hp12 -8 enwik8/9 on May 20, 2007.
Hardware as in note 19. Decompression verified for enwik8 only.
23. 7zip 4.46a options were -m0=PPMd:mem=1630m:o=10 -sfx7xCon.sfx
24. paq8o8-intel (intel compile of paq8o8) -1, paq8o8z-jun7 (DOS port of paq8o8) -1
reported by Rugxulo on Jun 10, 2008.
Timed on a AMD64x2 TK-53 Tyler 1.7 GHz laptop with Vista Home Premium SP1.
25. paq8o8z -1 enwik8 (DJGPP compile) reported by Rugxulo on Jun 17, 2008.
Tested on a 2.52 Ghz P4 Northwood, no HTT, WinXP Home SP2.
26. Tested on a Gateway M-7301U laptop with 2.0 GHz dual core Pentium T3200
(1MB L2 cache), 3 GB RAM, Vista SP1, 32 bit. Run times are similar to my
older computer.
27. enwik9 size reported by Eugene Shelwien, Mar. 5, 2009.
enwik8 size and all speeds are tested as in note 26.
28. Reported by Eugene Shelwien on a Q6600, 3.3 GHz, WinXP SP3, ramdrive:
bcm 0.06 on Mar. 15, 2009, bcm 0.08 on June 1, 2009.
29. Reported by kaitz (KZ): paq8p3 on Apr. 19, 2009, v2 on Apr. 21, 2009, paq8pxd on Jan. 21, 2012,
v2 on Feb. 11, 2012, v3 on Feb. 23, 2012, v4 on Apr. 23, 2012.
2012 tests on a Core2Duo T8300 2.4 GHz, 2 GB.
30. Reported by Sami Runsas (author of bwmonstr), July 14, 2009. Tested on an Athlon XP 2200 (Win32).
31. Reported by Dmitry Shkarin, July 21, 2009, Nov. 12, 2009. Tested on a 3.8 GHz Q9650 with 16 GB
memory under Windows XP 64bit Pro SP2. Requires msvcr90.dll.
32. Reported by Mike Russell, Sept. 11, 2009.
Tested on an 2.93 GHz Intel Q6800 with 3.5 GB memory.
33. Reported by Con Kolivas (author of lrzip) on Nov. 27, 2009 (lrzip 0.40),
Nov. 30, 2009 (lrzip 0.42), Mar. 17, 2012 (lrzip 0.612). Tested on a 3 GHz
quad core Q9650, 8 GB, 64 bit debian linux.
34. Reported by sportman, Nov. 29, 2009 (durilca'kingsize), Nov. 30, 2009 (durilca'kingsize4),
Apr. 8, 2010 (bsc 1.0.0). Test hardware:
2 x 2.4GHz (overclocked at 2.53 GHz) quad core Xeon Nahalem,
24GB DDR3 1066MHz, 8 x 2TB RAID5, Windows 2008 Server R2 64bit
35. Reported by zody on Dec. 12, 2009. Tested in Windows 7, x64, 3.6 GHz e8200, 4 GB 1066 MHz RAM.
36. Reported by Ilia Muraviev on Dec. 16, 2009. Tested on a 2.40 GHz Core 2 Duo,
DDR2-800 4GB RAM, Windows7 x64.
37. Reported by Sami Runsas, Mar. 3, 2010. Tested under Win64 on a Q6600 at 3.0 GHz.
38. Reported by Ilya Grebnov, Apr. 7, 2010. Tested on an Intel Core 2 Duo E8500, 8 GB memory,
Windows 7.
39. Reported by Ilya Grebnov, Apr. 8, 2010. Tested on an Intel Core 2 Quad Q9400, 8 GB memory,
Windows 7. bsc 2.00 on May 3, 2010. bsc 2.2.0 on June 15, 2010.
40. Reported by Sami Runsas, May 10, 2010. Tested on an overclocked Intel Core i7 860. nanozip 0.08a
tested June 6, 2010. nanozip 0.09a on Nov. 5, 2011.
41. lpaq9m reported by Alexander Rhatushnyak on June 9, 2010. Tested on an Intel Core i7 CPU 930
(8 core), 2.8 GHz, 2.99 GB RAM. paq8hp12any tested June 28, 2010.
42. Reported by Michal Hajicek, June 4, 2010 on an AMD Phenom II 965, 64 bit Windows.
WinRK, ppmonstr on June 14.
43. Reported by Ilia Muraviev, June 26, 2010. Tested on a Core 2 Quad Q9300, 2.50 GHz,
4 GB DDR2, Windows 7.
44. Timed on a Dell Latitude E6510 laptop Core I7 M620, 2.66 GHz, 4 GB, Windows 7 32-bit.
45. Reported by Richard Geldreich (lzham author) on Aug. 30, 2010. Tested on a
2.6 GHz Core i7 (quad core + HT), 6 GB, Win7 x64.
46. Reported by Stefan Gedo (ST author) on Oct. 14, 2010. Tested on Athlon II X4 635
2.9 GHz, 4 GB memory, Windows 7.
47. Reported by David A. Scott on Dec. 15, 2010. Tested on a I3-370 with 6 GB DDR3
1033 MHz memory.
48. Timed on a Dell Latitude E6510 laptop Core I7 M620, 2.66 GHz, 4 GB, Ubuntu Linux 64-bit.
49. Tested by the author on a Q9450, 3.52 GHz = 440x8, ramdrive.
50. Tested by the author on an Intel Core i7-2600, 3.4 GHz, Kingston
8 GB DDR3, WD VeloicRaptor 10000 RPM 600 GB SATA3, Windows 7 Ultimate SP1.
51. Tested by Bulat Ziganshin on i7-2600, 4.6 GHz with 1600 MHz RAM (8-8-8-21-1T)
and NVIDEA GeForce 560Ti at 900/2000 MHz.
52. Tested by Michael Maniscalco on an 8 core Intel Xeon E5620, 2.40 GHz,
12 GB memory running Windows 7 Enterprise SP1, 64 bit.
53. Tested by the author on a Core i7-2600K @ 4.6GHz, 8GB DDR3 @ 1866MHz,
240GB Corsair Force GT SSD.
54. Tested by Piotr Tarsa on a Core 2 Duo E8400, 8 GiB RAM, Ubuntu 11.10 64-bit,
OpenJDK 7.
55. Tested by David Catt on a 64 bit Windows 7 laptop, 2.33 GHz, 4 GB, 4 cores.
56. Reported by the author on a Athlon II X4 635 2.9 GHz, 4GB, Windows 8 Enterprise.
57. Reported by the author on a x86_64 Athlon 64 X2 5200+ with 8 GiB of RAM running GNU/Linux 2.6.38.6-libre.
58. Reported by the author on a 4 GHz i7-930 from ramdrive.
59. Reported
by the author on a I7-2600, 4.6 GHz, 16 GB RAM, Ubuntu 13.04.
60. Tested by Ilia Muravyov on an Intel Core i7-3770K, 4.8 GHz, 16 GB Corsair Vengeance LP 1800
MHz CL9, Corsair Force GS 240 GB SSD, Windows 7 SP1.
61. Tested by Matt Mahoney on a dual Xeon E-2620, 2.0 GHz, 12+12 hyperthreads,
64 GB RAM (20 GB usable), Fedora Linux.
62. Tested by Valéry Croizier on a 2.5 GHz Core i5-2520M, 4 GB memory, Windows 7 64 bit.
63. Tested by Ilia Muravyov on an Intel i7-3770, 4.7 GHz, Corsair Vengenance LP 1600 MHz CL9 16 GB RAM,
Samsung 840 Pro 512 GB SSD, Windows 7 SP1.
64. Tested by Kennon Conrad on a 3.2 GHz AMD A8-5500.
65. Tested by sportman on an Intel Core i7 4960X 3.6GHz OC at 4.5GHz - 6 core (12 threads) 22nm Ivy Bridge-E,
Kingston 8 x 4GB (32GB) DDR3 2400MHz 11-14-14 under clocked at 2000MHz 10-11-11.
Windows 8.1 Pro 64-bit, SoftPerfect RAM Disk 3.4.5 64-bit.
66. Tested by Byron Knoll on a Intel Core i7-3770, 31.4 GB memory, Linux Mint 14.
67. Tested by Kennon Conrad on a 4.0 GHz i4790K, 16 GB at 1866 MHz, 128 GB SSD Windows 8.1.
68. Tested by Ilia Muraviev on an Intel Core i7-3770K @ 4.8GHz, 8GB 2133 MHz CL11 DDR3,
512GB Samsung 840 Pro SSD, Windows 7 Ultimate SP1.
69. Tested by Nania Francesco Antonio on a Intel Core i7 920 2.67 ghz 6GB ram.
70. Tested by Richard Geldreich on a Core i7 Gulftown 3.3 Ghz, Win64.
71. Tested by Christoph Diegelmann on a Core i7-4770K, 8 GB DDR3, Samsung 840Pro 128 GB, Fedora 21 64 bit, gcc 4.9.2.
72. Tested by Skymmer on a i7-2770K, WinXP x64 SP2.
73. Tested by Andreas M. Nilsson on a 1.7 GHz Intel Core i7, 8 GB 1600 MHz DDR3, Mac OS X 10.10.3 (14D136).
74. Tested by Michael Crogan on a Core i7-3930K, 3.20 GHz, 6+HT, 64 MB, Linux64.
75. Tested by Mauro Vezzosi on a Core i7-4710HQ 2.50-3.50 GHz, 8 GB DDR3, Windows 8.1 64 bit.
76. Tested by Yann Collet on Core i7-3930K, 4.5 GHz, Linux 64, gcc 5.2.0-5.3.1.
77. Tested by Darek on a Core i7 4900 MQ, 2.8 GHz overclocked to 3.7 GHz, 16 GB, Win7Pro 64.
78. Tested by mpais on a Core i7 5820K 4.4 GHz, Windows 10.
79. Tested by Sportman on2 x Intel Xeon E5-2643 v3 6 cores (12 threads) 3.4GHz, 3.7GHz turbo, 20MB L3 cache,
8 x 32GB DDR4 2133MHz CAS 15, SoftPefect RAM Disk 3.4.7, Windows Server 2012 R2 64-bit.
80. Tested by kaitz on an Intel Celeron G1820 DDR3 8GB PC3-12800 (800 MHz).
81. Tested by Darek on Core i7 4900MQ 2.8GHz ovwerclocked to 3.8GHz, 32GB, Win7Pro 64.
82. Tested by Ilia Muraviev on an Intel Core i7-4790K @ 4.6GHz, 32GB @ 1866MHz DDR3 RAM, RAMDisk.
83. Tested by Byron Knoll on an Intel Core i7-7700K, 32 GB DDR4, Ubuntu 16.04-18.04.
84. Tested by Fabrice Bellard on 2 x Xeon E5-2640 v3 @ 2.6 GHz, 196 GB RAM, Linux.
85. Tested by Georgi Marinov on a Windows 10 Laptop: Lenovo Ideapad 310;
i5-7200u @2.5GHz; 8GB DDR4 @1066MHz (2133MHz) CL15 CR2T; L2 cache: 2x256KB; L3 cache: 3MB; SSD: Crucial MX500 500GB
86. Tested by Byron Knoll on an Intel Xeon 2.30 GHz, 13 GB, Tesla P100 GPU.
87. Tested by Byron Knoll on an Intel Xeon 2.00 GHz, 13 GB, Tesla V100 GPU.
88. Tested by Fabrice Bellard on an Intel Xeon E3-1230 v6, 3.5 GHz, RTX 3090 GPU.
89. Tested by Matt Mahoney on a Lenovo Intel i7-1165G7 (4 core, 8 thread) 2.80 GHz, 16 GB, Windows 10/Ubuntu 20.04.
90. Tested by Artemiy Margaritov on an Intel Xeon Silver 4114, 2.20 GHz, Ubuntu 18.
91. Tested by Zoltán Gotthardt on an Intel Core i7-8700K @ 3.70GHz, HyperX Fury 32GB 2666MHz DDR4 CL16 (2x16GB kit), Windows 10 Pro 64 bit. The system was not completely idle during the tests.
92. Tested by Darek on a DELL Precision 7730, Intel Core i9-8950HK, 32GB RAM (2400MHz), Windows 10 Pro for Workstations (21H2). The system was not completely idle during the tests.
93. Tested by Sportman on an Intel Core i9 12900KS 16 cores (8 efficient cores disabled, hyper-threading disabled) 3,4GHz, 5.5GHz turbo, 30MB L3 cache, 14MB L2 cache, 2 x 16GB DDR5 6400MHz (PC5-51200) timings 32-39-39-102, Windows 10 Pro 64-bit.
94. Tested by Byron Knoll on an Intel Xeon 2.2 GHz, 83 GB, A100 GPU.
95. Tested by Gerald R. Tamayo on a Dell Inspiron 3881 Intel Core i3-10100 16GB RAM @ 3.60GHz (Windows 10).
96. Tested by Ilya Grebnov on an Intel 9700K CPU (5GHz all cores) with 2x8 GB DDR4 RAM (4133 MHz with 17-17-17-37-400-2T timings) running Microsoft Windows 10 Pro (64 Bit).
97. Tested by Matt Mahoney on a Lenovo Core i7-1165G7 2.80 GHz 16 GB, SSD, Windows 11 or Ubuntu.
98. Tested by Ilia Muraviev on an Intel Core i7-12700K (stock), 32 GB DDR5 5200 MHJz, 1 TB M.2 NVMe SSD.
99. Tested by James Bowery on an AMD Ryzon 7-3700x, 3.6 GHz, 8 cores, 16 threads, 64 GB.
100. Tested by Andrea Barbato on a AMD Ryzen 9 5950X 3.4 GHz 32core processor Patriot Viper Steel RAM DDR4 3600 Mhz 32GB (4x32GB)
About the Compressors
.1072 nncp
nncp is a free, experimental
file compressor by Fabrice Bellard, released May 8, 2019.
It uses a neural network model with dictionary preprocessing described in the paper
Lossless Data Compression
with Neural Networks. Compression of enwik9 uses the options:
./preprocess c out.words enwik9 out.pre 16384 512
./nncp -n_layer 7 -hidden_size 384 -n_embed_out 5 -n_symb 16388 -full_connect 1 -lr 6e-3 c out.pre out.bin
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes
------- ------- ---------- ----------- ----------- ----------- ------ ------ ----- --- -----
nncp 2019-05-08 16,791,077 125,623,896 161,133 xd 125,785,029 420168 602409 2040 LSTM 84
nncp 2019-11-16 16,292,774 119,167,224 238,452 xd 119,405,676 826048 1156467 5360 LSTM 84
nncp v2 15,600,675 114,317,255 99,671 xd 114,317,255 308645 313468 17000 Transformer 88
nncp v2.1 15,020,691 112,219,309 100,046 xd 112,319,355 508332 515401 23000 Transformer 88
nncp v3 15,206,966 110,034,293 197,491 xd 110,231,784 161812 158982 6000 Transformer 88
nncp v3.1 14,969,569 108,378,032 201,620 xd 108,579,652 212766 210970 6000 Transformer 88
nncp v3.2 14,915,298 106,632,363 628,955 xd 107,261,318 241871 238670 7600 Transformer 88
.1082 cmix
cmix v1 is a free,
open source (GPL) file compressor by Byron Knoll, Apr. 16, 2014.
It is a context mixing compressor with dictionary preprocessing based
on code from paq8hp12any and paq8l but increasing the number of
context models and mixer layers. It takes no compression options.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Notes
------- ------- ---------- ----------- ----------- ----------- ------ ------ ----- -----
cmix v1 16,076,381 128,647,538 279,185 x 128,926,723 181924 179706 20785 66
cmix v2 15,863,623 126,323,656 310,068 x 126,633,724 580083 577626 28152 66
cmix v3 15,809,519 125,971,560 274,992 x 126,246,552 267978 266622 26681 66
cmix v4 15,784,946 125,621,620 278,375 x 125,899,995 284243 282390 28976 66
cmix v5 15,769,367 125,526,628 163,552 s 125,690,180 282056 282647 28865 66
cmix v6 15,738,922 124,172,611 161,908 s 124,334,519 280749 282137 30882 66
cmix v7 15,738,825 124,168,463 166,785 s 124,335,248 280416 280904 30600 66
cmix v8 15,709,216 123,930,173 164,882 s 124,095,055 344244 346641 30311 66
cmix v9 15,627,536 123,874,398 161,911 s 124,036,309 346436 345681 26929 66
cmix v10 15,587,868 123,257,156 164,263 s 123,421,419 355721 355850 29924 66
cmix v11 15,566,358 122,977,954 172,261 s 123,150,215 377529 374440 27745 66
cmix v12 15,440,186 121,718,424 175,953 s 121,894,377 571339 574522 27865 66
cmix v13 15,323,969 120,480,684 177,979 s 120,658,664 617346 615987 27803 66
cmix v14 15,210,458 119,017,492 203,717 s 119,221,209 631838 627802 28287 83
cmix v15 15,111,677 117,959,016 217,830 s 118,176,846 650055 651716 28365 83
cmix v16 14,955,482 116,912,035 226,121 s 117,138,156 613898 658679 27708 83
cmix v17 14,877,373 116,394,271 208,263 s 116,602,534 641189 645651 25258 83
cmix v18 14,838,332 115,714,367 208,961 s 115,923,328 602867 601569 25738 83
cmix v19 14,837,987 111,470,932 223,485 sd 111,694,417 605110 601825 25528 83
cmix v20 14,760,552 109,877,715 241,725 sd 110,119,440 621780 619024 31650 83
cmix v21 -t 14,623,723 107,963,380 281,387 sd 108,244,767 622949 638442 30950 83
.1121 fx-cmix
fx-cmix
(discussion)
(self extracting enwik9)
is an open source Hutter prize submission by Kaitz, Dec. 4, 2023. It is an optimization of the previous
submissions cmix-hp, fast-cmix, and starlit. I only tested the supplied Linux self extracting archive.
The extraction time of 60 hours was at 74% CPU (one thread) due to SSD disk thrashing on the second day.
User time was 152496 s and system time was 9425 s (45 hours total).
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes ------- ------- ---------- ----------- ----------- ----------- ------ ------ ----- --- ----- archive9 112,142,259 0 xd 112,142,259 216836 8869 CM 97
tensorflow-compress v1 is a free, open source experimental file compressor by Byron Knoll, July 20, 2020. It uses a LSTM neural network accelerated by a GPU if available. It uses a dictionary and preprocessor from NNCP by default, or from cmix. The test results for v1 use the default settings and were tested by the author on an Intel Xeon 2.30 GHz, 13 GB RAM with a Tesla P100 GPU. It uses 10138 MiB CPU RAM and 15525 MiB GPU RAM. It is run as a Colab notebook.
v2 was released Sept. 7, 2020. It runs on a V100 GPU using 2669 MB CPU RAM and 15621 MB GPU RAM. The decompressor contains a cohab notebook, NNCP preprocessor source code and makefile, and a dictionary created by the NNCP preprocessor.
v3 was released Nov. 29, 2020. It uses 3252 MiB of CPU RAM on a 2.00 GHz Xeon and 15621 MiB of GPU RAM on a Tesla V100.
v4 was released Aug. 10, 2022. It uses 5696 MiB of CPU RAM and 39664 GPU RAM on an Intel Xeon 2.2 GHz, 83 GB RAM, A100 GPU.
Program enwik8 enwik9 Prog Total Comp Deco Mem Note --------- ---------- ----------- -------- --------- ---- ---- ---- ---- tensorflow-compress v1 20,119,747 159,716,240 88,870sd 159,805,110 72260 82259 25663 86 tensorflow-compress v2 16,828,585 127,146,379 175,047sd 127,321,426 157196 142820 18290 87 tensorflow-compress v3 16,128,954 118,938,744 54,597sd 118,993,341 300104 300408 18873 87 tensorflow-compress v4 15,905,037 113,542,413 55,283sd 113,597,696 291394 290803 45360 94
cmix-hp (mirror) is a Hutter prize submission by Byron Knoll, June 10, 2021. It is a simple modification to startlit (May 31 2021 submission) to enlarge the PPMD model and map it to 21.4 GB virtual memory. to meet the Hutter prize requirement of using at most 10 GB RAM and 100 GB disk. It uses 94% CPU on the SSD swapping 45 GB.
cmix-hp v2 was released Aug. 1, 2021.
cmix-hp v3 was released Aug. 9, 2021.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes ------- ------- ---------- ----------- ----------- ----------- ------ ------ ----- --- ----- cmix-hp v1 15,957,339 113,712,798 0 xd 113,712,798 189420 194280 6873 CM 89 cmix-hp v2 15,221,487 113,816,319 0 xd 113,816,319 198900 6720 CM 89 cmix-hp v3 113,788,598 0 xd 113,788,598 188460 188040 6693 CM 89
starlit is a Hutter prize submission by Artemiy Margaritov on May 10, 2021, updated May 31, 2021. It is a free, open source, Linux compressor that produces a self extracting archive for enwik9 as a special case. It satisfies the Hutter prize rules of using less than 10 GiB of memory (the figure shown is in 1000 KiB), and 20 GB of disk space and compressing and decompressing in less than 50,000/(geekbench 5 score) hours each. I tested on a Lenovo Intel Core i7-1165G7, 2.80 GHz, 16 GB (geekbench 5 = 1427 single thread, 4667 multithreaded) in an Ubuntu 20.04 shell window under Windows 10 with the screen/sleep saver and WiFi turned off for 2 days each to compress and decompress.
starlit compresses by first reordering the articles in enwik9 to maximize mutual information between consecutive articles, then uses the dictionary preprocessor from phda9 and compresses using a reduced version of cmix to decrease memory usage from 32 GB to 10 GB and increase speed. The compressor is built from the supplied bash scripts by compiling with clang++-12 in Linux with different parts optimized for size or speed. Then the dictionary and article order list (both text files) are compressed with the newly created cmix and appended to the executable. The size is 124,984 bytes before appending and 401,505 bytes afterward. (A precompiled cmix is supplied optimized for an AMD Zen 2 with size 114,012 bytes before appending, which I did not use). The new executable then compresses enwik9 by extracting the compressed article order and dictionary and an additional 17 GB of temporary files to produce an executable file named archive9. To decompress, archive9 is run, which extracts the dictionary, article order list, and 17 GB of temporary files, and 2 days later, the output as a file named enwik9_uncompressed. No other files are required to decompress. The original article order is restored by sorting the titles alphabetically.
To compress enwik8, the command is cmix -e .dict enwik8 enwik8.cmix. To decompress: cmix -d .dict enwik8.cmix enwik8_uncompressed, where .dict is the uncompressed dictionary file. Articles are not reordered. cmix is the reduced cmix with or without the appended compressed files.
Modifications to cmix_v18 (from README.md).
Changes to HP-2017 (phda9) enwik8-specific transforms
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- starlit 10 May 2021 115,093,300 0 xd 115,093,300 273600 9910 CM 90 starlit 31 May 2021 15,215,107 114,951,433 0 xd 114,951,433 173953 171682 10233 CM 89
The original prize winning version is a 64 bit Linux decompressor (no source) and compressed enwik8 as a RAR archive, awarded Nov. 4, 2017, posted Aug. 12, 2019. Archive plus decompressor size is 15,284,944 bytes. It uses 1 GB memory and a 176 MB scratch file. There is a version that uses only RAM.
phda9 1.2 (discussion) was released Mar. 13, 2018.
phda9 1.3 was released Apr. 21, 2018. The decompressor size for enwik8 is different (557050 bytes) because the dictionary is loosely compressed in the decompressor instead of in the compressed file.
phda9 1.4 was released May 20, 2018. This is mainly a bug fix version.
phda9 1.5 was released Aug. 1, 2018. enwik8 uses a separate decompressor with a size of 557415 bytes.
phda9 1.6 was released Oct. 20, 2018. enwik8 uses a separate decompressor with a size of 564616 bytes.
phda9 1.7 was released Feb. 18, 2019. enwik8 uses a separate decompressor with a size of 565,352 bytes.
phda9 1.8 was released July 4, 2019. enwik8 uses a separate decompressor with a size of 558,298 bytes.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Notes ------- ------- ---------- ----------- ----------- ----------- ------ ------ ----- ----- phda9 1.0 15,173,565 118,658,060 41,994 xd 118,700,054 56815 55201 5031 83 phda9 1.2 15,144,786 118,335,817 42,745 xd 118,378,562 60726 61586 4992 83 phda9 1.3 15,069,752 117,617,185 42,108 xd 117,659,293 86557 87375 4996 83 phda9 1.4 15,074,624 117,603,125 42,110 xd 117,645,235 87520 87909 4992 83 phda9 1.5 15,063,267 117,223,130 42,428 xd 117,265,558 85877 86365 4995 83 phda9 1.6 15,040,647 117,039,346 41,911 xd 117,081,257 84713 88401 4996 83 phda9 1.7 15,023,870 116,940,874 43,274 xd 116,984,148 83712 87596 4996 83 phda9 1.8 15,010,414 116,544,849 42,944 xd 116,587,793 86182 86305 6319 83
paq8px_v206fix1 is the latest versions in the following PAQ series of open source (GPL) context mixing archivers.
p5, p6, and p12 (Matt Mahoney, May 13, 2000) use a neural network with 256K or 4M inputs, no hidden layer and a single output to predict the next bit of input, given hashes of various contexts to select active inputs. The output is arithmetic coded. p5 uses 1 MB memory and context orders 0 to 3. p6 uses 16 MB and orders 0-5. p12 uses 16 MB, orders 1-4 and word-level orders 0-1 as an optimization for text. The programs take no options. The algorithm is described in M. Mahoney, Fast Text Compression with Neural Networks, Proc. AAAI FLAIRS, Orlando, 2000 (C) 2000, AAAI.
paq1 (Matt Mahoney, Jan. 6, 2001) replaces the neural network in p5, p6, p12 with a fixed weighted averaging of model outputs. Described in an unpublished report, M. Mahoney, The PAQ1 Data Compression Program, 2002.
paq6 (Matt Mahoney and Serge Osnach, Dec. 30, 2003) evolved as a series of improvements to paq1. It is described in M. Mahoney, Adaptive Weighing of Context Models for Lossless Data Compression, Florida Tech. Technical Report CS-2005-16, 2005. The most significant improvements are replacing the fixed model weights with adaptive linear mixing (Matt Mahoney), and SSE (secondary symbol estimation) postprocessing on the output probability, and modeling of sparse contexts (Serge Osnach). Other models were added for x86 executable code, and automatic detection of fixed length records in binary data. Intermediate versions can be found here.
paqar 4.5 (Alexander Rhatushnyak, Feb. 13, 2006) is the last of a long series of improvements to paq6 by Alexander Rhatushnyak (paqar: multimixer model, .exe preprocessor, other model improvements), Przemyslaw Skibinski (WRT text preprocessing), Berto Destasio (model tuning), Fabio Buffoni (speed optimizations), David. A Scott (arithmetic coder optimizations), Jason Schmidt (model improvements), and Johan de Bock (compiler optimizations). For text, the biggest improvement was from WRT (Word Reducing Transform), which replaces words with shorter codes from an external English dictionary to PAsQDa 1.0 on Jan. 18, 2005. WRT is described in P. Skibiński, Sz. Grabowski, and S. Deorowicz, Revisiting dictionary-based compression, Software - Practice & Experience, 35 (15), pp. 1455-1476, December 2005. There were a great number of versions by many contributors, mostly in 2004 when the PAQ series moved to the top of most compression benchmarks and attracted interest. Prior to PAQ, the top ranked programs were generally closed source.
paq8f (Matt Mahoney, Feb. 28, 2006) evolved from paq7 (Dec. 24, 2005) as a complete rewrite of paq6/paqar. The important improvements were replacing the adaptive linear mixing of models with a neural network (coded in MMX assembler), a more memory-efficient mapping of contexts to bit histories using a cache-aligned hash table, adaptive mapping of bit histories to probabilities, and models for bmp, tiff, and jpeg images. It models text using whole-word contexts and case folding, like all versions back to p12, but lacks WRT text preprocessing. It served as a baseline for the Hutter prize. Details are in the source code comments.
paq8g (Przemyslaw Skibinski, Mar. 3, 2006) adds back WRT text preprocessing.
paq8h (Alexander Rhatushnyak, Mar. 24, 2006) added additional contexts to the neural network mixer. It was top ranked on enwik9 (but not enwik8) when the Hutter prize was launched on Aug. 6, 2006. This is the 78'th version since p5.
raq8g by Rudi Cilibrasi, released 0721Z Aug. 16, 2006, is a modification of paq8f. It adds a NestModel to model nesting of parenthesis and brackets. The test below for -7 is based on a Windows compile, raq8g.exe. The test for -8 was under Linux. The unzipped Linux executable is 27,660 bytes.
paq8j by Bill Pettis, Nov. 13, 2006, is based on paq8f (no dictionary) with model improvements taken from paq8hp5. It is a general purpose compressor like paq8f, not specialized for text.
paq8ja.zip by Serge Osnach, Nov. 16, 2006, is an improvement of paq8j, using additional contexts based on character classifications.
paq8jb.zip by Serge Osnach, Nov. 22, 2006, adds contexts using the distance to an anchor byte (x00, space, newline, xff) combined with previous characters. The -8 test caused some minor disk thrashing at 2 GB memory under WinXP Home (82% CPU usage). Time reported is wall time.
paq8jc.zip by Serge Osnach, Nov. 28, 2006, improves the record model for better compression of some binary files, although it is slightly worse for text. Time for -8 is wall time at 72% CPU usage.
paq8jd by Bill Pettis, Dec. 30, 2006, improves on paq8j with additional SSE (APM) stages. enwik8 -8 caused some disk thrashing at 2 GB memory.
paq8k is by Bill Pettis, Feb. 13, 2007.
paq8l by Matt Mahoney, Mar. 8, 2007, is based on paq8jd. It adds a DMC model and minor improvements.
paq8fthis2 by Jan Ondrus, Aug. 12, 2007, is paq8f with an improved model for compressing JPEG images. It is otherwise archive compatible with paq8f for data without JPEG images (such as enwik8 and enwik9).
paq8n by Matt Mahoney, Aug. 18, 2007, combines paq8l with the JPEG model from paq8fthis2.
paq8o and paq8osse by Andreas Morphis, Aug 22 2007, is paq8n with an improved model for .bmp images. There are two executables that produce identical archives. paq8o.exe is for Pentium MMX or higher. paq8osse.exe is for newer processors that support SSE2 instructions like the Pentium 4. It is about 8% faster, but uses more memory. Both use the same C++ source but use different (but equivalent) assembler code to implement the neural network mixer. paq8osse.exe was compiled with Intel C++, which produces slightly faster executables than g++ used in earlier versions. The current version is paq8o ver. 2 (Aug. 24, 2007), which fixes the file name extension (was .paq8n) but does not change compression. The benchmark is based on the first version.
paq8o3 by KZ, Sept. 11, 2007, combines paq8o with an improved JPEG model from paq8fthis3 (Jan Ondrus, Sept. 8, 2007) and an improved model for grayscale PGM images from paq8i (Pavel Holoborodko, Aug. 18, 2006). Text compression is unchanged from paq8l, paq8m, paq8o, or paq8o2.
paq8o4 v1 by KZ, Sept. 15, 2007, includes a grayscale .bmp model (based on the grayscale PGM model). Text compression is unaffected. It was compiled with Intel C++. paq8o4 v2 by Matt Mahoney, Sept. 17, 2007, is a port to g++ which allows wildcards, directory traversal, and directory creation, but is 8% slower. It is archive compatible with v1.
paq8o6 by KZ, Sept. 28, 2007, is based on paq8o5 by KZ, Sept. 21, 2007 with the improved JPEG model from paq8fthis4 by Jan Ondrus, Sept. 27, 2007. paq8o5 is paq8o4 with an improved StateMap from lpaq1. The improved compression of enwik8 comes from this StateMap. Compression of enwik8 is unchanged from paq8o5 to paq8o6.
paq8o7 by KZ, Oct. 16, 2007, improves paq8o6 with improved JPEG compression and support for 4 and 8 bit BMP images. Text is not affected.
paq8o8 by KZ, Oct. 23, 2007, improves paq8o7 with improved JPEG compression further.
paq8o8-jun7 is a DOS port of paq8o8 by Rugxulo, June 7, 2008.
paq8o10t is by KZ, June 11, 2008. Discussion.
paq8p3 is by KZ, Apr. 19, 2009.
paq8p3 v2 is by KZ, Apr. 21, 2009.
paq8px_v60_turbo (source code and discussion) was by Jan Ondrus (with contributions from many others), June 20, 2009, and speed optimized by LovePimple on July 11, 2009. By default the turbo version runs in high priority under Windows, but was tested at normal priority. The v60 version was released after a long period of development beginning with v1 on Apr. 25, 2009. Development was aimed mostly at improving x86, image and wav compression. Decompression was not verified.
paq8px_v69 was released Apr. 26, 2010.
paq8pxd by kaitz, Jan. 21, 2012, modifies paq8px_v69 by adding dynamic dictionary preprocessing (based on XWRT), UTF-8 detection, and an alternating byte sparse model.
paq8pxd_v2 by kaitz (KZo) was released Feb. 11, 2012.
paq8pxd_v3 by kaitz (KZo) was released Feb. 23, 2012. Modified im8model, base64 in email model, and fixes false image detection in enwik9.
paq8pxd_v4 by kaitz was released Apr. 19, 2012. Adds 4 bit bmp model, base64 fixes, combines WRT source code and has other fixes.
paq8pxd_v5 by kaitz was released Apr. 18, 2013.
paq8pxd_v7 by kaitz was released Aug. 14, 2013.
paq8pxd_v8 by kaitz was a temporary release on June 16, 2014. It was still under development to fix bugs causing it to fail on JPEG and WAV input, but there were no errors for enwik8 or enwik9. To test, it was compiled from source under 64 bit Ubuntu using g++ 4.8.1 -O3.
paq8pxd_v10fix by kaitz was released June 21, 2014. It was compiled from source under 64 bit Ubuntu, g++ 4.8.1 -O3.
paq8pxd_v12 by kaitz was released July 28, 2014. It was compiled from source under 64 bit Ubuntu, g++ 4.8.1 -O3.
paq8pxd_v12-skbuild, Aug. 9, 2014, is a 64 bit port of paq8pxd_v12 by Skymmer with work by AlexDoro adding options -9 and -10, each of which doubles memory usage from the previous level.
paq8pxd_v13_x64 is the 64 bit compile by Skymmer of paq8pxd_v13fix3 by kaitz on Aug. 26, 2014. It supports levels up to 15 using 25955 MB memory.
paq8pxd_v15 was released Sept. 17, 2014. It has options -s1...-s15 and -f1...-f15 which mean slow or fast respectively. Higher levels use more memory. Faster methods use fewer models. Levels 9 and higher require a 64 bit compile. To test, the program was compiled with g++ 4.8.2 for 64 bit Ubuntu with option -O3.
paq8pxd_v12_biondivers1_x64 is a 64 bit build of v12 by Luca Biondi, Oct. 27, 2014.
paq8pxd_v18 by kaitz was released July 18, 2016. Options -{qfs} select quick, fast, slow, followed by a number selecting memory usage.
paq8px_v77 was released July 10, 2017.
paq8px_v32 and pax8pxd_v96 with DRT and split preprocessing of enwik9 were released Aug. 29, 2017.
paq8pxd_v47 was released Mar. 18, 2018.
paq8pxd_v48_bwt1 was released Aug. 9, 2018.
paq8pxd_v61 was released Feb. 23, 2019. Resplit package.
paq8px_v206fix1 was released on June 6, 2022 by Zoltán Gotthardt. As most paq8px versions it doesn't specifically target enwik9. That is, it doesn't have enwik-specific models, it doesn't preprocess enwik8/enwik9 and it doesn't reorder articles either. Memory options range from -1 (146 MB) to -12 (28 GB), optional parameters may be added to use an LSTM model (L), to use adaptive learning rate (A, which hurts enwik results), to load dictionary files to pre-train the Normal, Word and Text models before compression (T), to load a pre-trained LSTM repository to pre-train the LSTM model before compression (E) or to use the paq8px executable itself to pre-train the Normal model. All these optional switches (and some of their combinations) were tested at memory level 12. About the results:
s = zipped source sd = zipped source + paq8px-compressed dictionary and repository files Zipped source files (original, files not needed to compile such as CHANGELOG and README are excluded): 402,949 = 7z.exe a -mm=Deflate -mfb=258 -mpass=15 -mx9 paq8px_v206fix1_src.zip Paq8px-compressed dictionary files used for text pre-training with the command line switch "T": 109,478 = paq8px -12 @_list.txt (english.dic + english.exp + english.emb in multiple file mode) Paq8px-compressed LSTM repository, used with the command line switch "R": 221,311 = paq8px -12 english.rnn
Options select memory usage as shown in the table. Early versions took no options. Most versions were not tested on enwik9 due to their slow speed.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---- p5 31,255,092 9,298 s 3421 1 6 p6 25,377,998 9,421 s 4190 16 6 p12 24,714,219 9,598 s 4160 16 6 paq1 22,156,982 16,436 s 7800 7790 50 paq6 v2 -8 19,589,267 26,548 s 47624 808 paqar 4.5 -7 18,388,609 414,164 s 118690 119010 470 paq8f -7 18,289,559 34,371 x 68960 854 -8 18,075,265 34,371 x 69170 1693 paq8g -7 17,817,246 804,867 s 44130 854 paq8h -7 17,674,700 147,195,723 801,612 s 147,997,335 56511 57278 854 5 raq8g -7 18,132,399 33,483 x 84555 84793 1089 -8 17,923,022 27,660 x 337430~330000 2095 17 -8 17,923,022 27,660 x 196540~196000 2095 15 paq8j -7 18,208,284 39,366 s 138030 138260 959 -8 17,991,628 39,366 s 138990 136500 1896 paq8ja -7 18,184,224 39,781 s 148560 143200 993 -8 17,968,233 39,781 s 154700 153990 1965 paq8jb -7 18,180,081 39,982 s 148570 148200 1009 -8 17,964,363 39,982 s 188590 190190 1999 paq8jc -7 18,185,705 40,064 s 150910 152080 1017 -8 17,970,943 40,064 s 224410 234900 2015 paq8jd -7 18,158,159 40,460 s 157340 156350 1030 -8 17,943,042 40,460 s 406730 2028 paq8k -8 18,239,915 41,881 s 457150 1463 paq8l -6 18,518,485 35,955 x 133910 435 -7 18,168,563 35,955 x 134770 837 -8 17,916,450 35,955 x 136000 136390 1643 paq8fthis2 -8 18,075,265 34,846 x 69100 69310 1693 paq8n -8 17,916,420 37,402 x 134880 135480 1643 paq8o -8 17,916,451 42,389 s 135850 135260 1643 paq8osse -8 17,916,451 42,290 s 125260 124570 1778 paq8o3 -8 17,916,450 43,745 s 134580 134530 1636 paq8o4 v1 -8 17,916,450 43,876 s 126780 126560 1636 paq8o6 -8 17,904,721 44,883 s 139530 139520 1712 paq8o7 -8 17,904,756 45,979 s 139140 138530 1574 paq8o8 -8 17,904,756 46,381 s 139370 139150 1574 paq8o8-intel -1 22,260,679 46,381 s 24687 37 24 paq8o8z-jun7 -1 22,260,679 49,085 s 25919 37 24 -1 22,260,680 29639 37 25 paq8o10t -8 17,772,821 50,865 s 144250 143720 1591 paq8p3 -7 18,044,229 150,709,834 57,288 s 150,767,122 72412 803 29 paq8p3 v2 -7 17,990,788 86891 803 29 -8 17,759,875 87305 1574 29 paq8px_v60_turbo -8 17,733,057 146,272,609 53,846 s 146,326,455 143846 1643 26 paq8px_v69 -7 17,939,225 20170 878 26 paq8pxd_v1 -7 17,596,170 144,773,408 83,547 s 144,856,955 63302 811 29 paq8pxd_v2 -7 17,045,653 94280 853 29 -8 16,848,214 95350 1658 29 paq8pxd_v3 -7 17,045,354 140,110,094 72,976 s 140,183,094 80069 853 29 -8 16,847,903 136,777,893 72,976 s 136,850,869 82822 1658 29 paq8pxd_v4 -8 16,642,941 135,027,170 67,766 s 135,094,936 88409 1633 29 paq8pxd_v5 -8 16,699,597 67,745 s 114960 116450 1633 26 paq8pxd_v7 -8 16,606,773 134,791,909 70,210 s 134,862,119 93751 1633 29 paq8pxd_v8 -8 16,607,759 134,781,085 72,059 s 134,853,144 59387 54611 1521 48 paq8pxd_v10fix -8 16,607,760 134,780,308 72,382 s 134,852,690 37177 54433 1633 48 paq8pxd_v12 -8 16,577,460 134,452,453 81,196 s 134,533,649 54812 54506 1586 48 paq8pxd_v12-skbuild -10 16,372,331 129,827,930 422,400 s 130,250,330 28313 6500 65 paq8pxd_v13_x64 -15 16,595,606 131,598,576 83,499 s 131,682,075 29924 25955 65 paq8pxd_v15 -s9 16,437,892 131,992,226 88,538 s 132,080,764 54993 55067 3243 48 -f9 17,838,013 11980 11760 1555 48 paq8pxd_v12_bio -11 16,361,221 129,435,477 82,111 s 129,517,588 30537 13000 65 paq8pxd_v18 -q8 27,789,833 237,862,503 100,521 s 237,963,024 738 144 80 -q9 27,674,156 235,259,956 100,521 s 235,360,477 794 288 80 -f8 17,896,675 146,238,833 100,521 s 146,339,354 8725 762 80 -f9 17,814,539 6696 1482 80 -f10 17,790,248 7401 2666 80 -f11 18,081,957 7082 5034 80 -f12 18,078,461 8755 5674 80 -s8 16,516,558 134,561,662 100,521 s 134,662,183 75267 2298 80 -s9 16,370,991 65814 4552 80 -s10 16,308,754 65233 7448 80 -s15 16,345,626 129,125,083 100,521 s 129,225,607 46698 46608 37878 79 paq8px_v77 -8 17,629,076 145,454,919 62,154 s 145,517,073 86266 86192 1625 48 drt|paq8px_v96 -8 16,704,802 137,170,609 167,886 s 137,338,495 63618 64113 1700 81 paq8pxd_v32 -s15 16,254,271 128,209,407 144,756 s 128,354,163 41418 43518 27278 81 paq8pxd_v47 -s15 16,080,717 127,404,715 139,841 s 127,544,556 75022 75611 27500 81 paq8pxd_v48_bwt1 -s14 16,004,759 126,183,029 153,295 s 126,336,324 579894 51865 81 paq8pxd_v61 -15 15,968,477 126,587,796 194,704 s 126,782,500 98571 98751 41200 81 paq8px_v206fix1 -12 16,046,995 126,486,867 402,949 s 126,889,816 151474 ------ 28138 91 paq8px_v206fix1 -12A 16,068,251 ---,---,--- 402,949 s ---,---,--- ------ ------ 28151 91 paq8px_v206fix1 -12LRET 15,820,862 ---,---,--- 733,738 sd ---,---,--- ------ ------ 92 paq8px_v206fix1 -12T 15,995,416 126,407,894 512,427 sd 126,920,321 161336 ------ 28138 92 paq8px_v206fix1 -12LT 15,799,749 124,619,348 512,427 sd 125,131,775 507007 514197 28151 92 paq8px_v206fix1 -12L 15,849,084 124,696,410 402,949 s 125,099,359 291916 294847 28151 93
durilca and durilca'light 0.5 by Dmitry Shkarin (Apr. 1, 2006) are closed source, experimental command line file compressors based on ppmd/ppmonstr with filters for text, exe, and data with fixed length records (wav, bmp, etc). durilca'light is a faster version with less compression. Unfortunately both crash on enwik9. Decompression is verified on enwik8.
The -m700 option selects 700 MB of memory. (It appears to use substantially more for enwik9 according to Windows task manager). -o12 selects PPM order 12 (optimal for enwik9 -t0). -t0 (default) turns off text modeling, which hurts compression but is necessary to compress enwik9 (although decompression still crashes). -t2(3) turns on text preprocessing (dictionary; thus the increased decompresser size). -t2 also supports 3 additive flags (4, 8, 16) which have no effect on this data, thus -t2(31) or -t2 (default is 31) give the same compression as -t(3).
durilca 0.5(Hutter) was released 1457Z Aug. 16, 2006. It does not use external dictionaries. When run with 1 GB memory (-m700), -o13 is optimal. With 2 GB (-m1650), -o21 is optimal. The unzipped .exe file is 86,016 bytes.
durilca4linux_1 (0825Z Aug 23 2006) is a Linux version of durilca 0.5(Hutter) which successfully compresses enwik9 and decompresses with UnDur (23,375 bytes zipped, 42,065 bytes uncompressed). All versions of durilca require memory specified by -m plus memory to read the input file into memory. In Windows, this exceeds the 2 GB process limit regardless of available RAM and swap. Thus, enwik9 compresses only under Linux with 2 GB real memory and 1 GB additional swap. The -o12 option is optimal for enwik9 (tested under 64 bit SuSE 10.0 by the author), -o24 for enwik8 (verified by me under 64 bit Ubuntu 2.6.15).
durilca4linux_2 (Oct. 16, 2006) is a closed source Linux version specialized for this benchmark. It includes a warning that use on other files may cause data loss. It requires AMD64 Linux and 3 GB of memory (2 GB for enwik8). The decompresser files (EnWiki.dur and UnDur) are contained within a 241,322 byte zip file in the rar distribution. To compress:
./DURILCA d EnWiki.dur ./DURILCA e -m1800 -o10 -t2 enwik9To decompress:
./UnDur EnWiki.dur ./UnDur enwik9.durThe first step extracts a compressed dictionary. It is organized in a similar manner to paq8hp2-paq8hp5 in that syntactically related words and words with the same suffix are grouped together. Results are reported by the author under Suse Linux 10.0. I verified enwik8 only (6480 ns/b to compress on a 2.2 GHz Athlon 64 with 2 GB memory under Ubuntu Linux). enwik9 caused disk thrashing.
durilca4linux_3 (dictionary version v1) was released Feb. 21, 2008. Like version 2, it requires extraction of EnWiki.dur before compressing or decompressing, and may not work with files other than enwik8 and enwik9. As tested, requires 64-bit Linux, 4 GB RAM, and 5 GB RAM+swap.
undur3 v2 contains an improved dictionary (version v2), released Apr. 22, 2008, for DURILCA4Linux_3. The compression and decompression programs are the same. The decompression program UnDur (Linux executable) is included. To compress, download durilca4linux_3 and replace the dictionary (EnWiki.dur) with this one. The options are -m3600 (3600 MB memory), -o14 (order 14 PPM), -t2 (text model 2).
undur3 v3, released May 22, 2008, uses an improved dictionary but the same compressor and decompresser as v1 and v2. The dictionary contains 123,995 lowercase words separated by NUL bytes. Of these, 5579 words occur more than once (wasted space?) I tested options -m1500 under Ubuntu Linix with 2 GB memory. At -m1500 top reports 2157 MB virtual memory and 1894 MB real memory. -m1600 caused disk thrashing.
durilca kingsize (July 21, 2009) runs under 64 bit Windows and requires 13 GB memory. It is designed to work only on this benchmark and not in general. The dictionary file EnWiki.fsd must be extracted first from EnWiki.dur before compression or decompression. Requires msvcr90.dll. enwik8 can be compressed with -m1200 (1.2 GB).
durilca4_decoder is a new dictionary for durilca'kingsize (above), Nov. 12, 2009. It is reported as "durilca'kingsize_4" below. Decompression time is reported to be 1411.88 sec with "durilca d" and 1796.98 sec with "UnDur". enwik8 compresses with 1200 MB (-m1200) in 157.38 sec.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Notes ------- ------- ---------- ----------- ----------- ----------- ----- ----- ----- durilca'light 0.5 -m650 -o12 21,089,993 178,562,475 1,495,422 x 180,057,897 1227 (fails) durilca 0.5 -m700 -o12 -t0 19,227,202 162,117,578 74,292 x 162,191,870 4140 (fails) -m800 -o128 19,321,003 164,298,178 74,292 x 165,372,470 7718 (fails) -m700 -o12 -t2(3) 18,520,589 (fails) 1,507,312 x 3330 3940 durilca 0.5(Hutter) -m700 -o13 -t2 18,128,339 (fails) 77,295 x 5905 -m1650 -o21 -t2 17,958,687 (fails) 77,295 x 6140 6140 durilca4linux_1 -m700 -o13 -t2 18,128,334 23,375 xd 5950 5880 -m1750 -o12 -t2 18,027,888 146,521,559 23,375 xd 146,544,934 5500 7301 18 -m1750 -o24 -t2 17,949,422 23,375 xd 6190 6780 durilca4linux_2 -m1800 -o10 '-t2(11)' 17,002,831 136,536,189 241,322 xd 136,777,511 4249 4827 18 -m1800 -o10 -t2 16,998,300 136,596,818 241,322 xd 136,838,140 4405 4894 18 durilca4linux_3 v1 -m3600 -o14 -t2 16,356,063 129,933,145 345,957 xd 130,279,102 3649 3715 18 -m1200 -o32 -t2 16,348,796 4170 4178 18 durilca4linux_3 v2 -m3600 -o14 -t2 16,323,581 129,670,441 344,525 xd 130,014,966 3628 3639 18 -m1200 -o32 -t2 16,316,255 4148 4157 18 durilca4linux_3 v3 -m3600 -o14 -t2 16,292,414 129,469,384 339,990 xd 129,809,374 3624 3627 18 -m1200 -o32 -t2 16,285,285 4135 4138 18 -m1500 -o6 -t2 16,517,051 133,674,565 3852 -m1500 -o7 -t2 16,418,799 132,239 495 4006 -m1500 -o8 -t2 16,368,632 131,722,213 4149 -m1500 -o9 -t2 16,335,259 131,549,901 339,990 xd 131,889,891 4261 4344 -m1500 -o10 -t2 16,316,775 131,574,739 4405 -m1500 -o11 -t2 16,306,086 131,707,901 4544 -m1500 -o12 -t2 16,299,411 131,807,298 4554 -m1500 -o14 -t2 16,292,414 132,238,662 4763 -m1500 -o16 -t2 16,289,512 132,516,825 4879 -m1500 -o32 -t2 16,285,285 134,238,759 5440 durilca'kingsize -m13000 -o40 -t2 16,258,380 127,695,666 333,790 xd 128,029,456 1413 1805 31 -m22500 -o40 -t2 127,695,666 1806 1814 34 durilca'kingsize_4 -m13000 -o40 -t2 16,209,167 127,377,411 407,477 xd 127,784,888 1398 1797 31 16,209,167 127,377,411 1788 1802 34
cmv 00.01.00 is a free, closed source, experimental file compressor for 32 bit Windows by Mauro Vezzosi, Sept. 6, 2015. It uses context mixing. Option "2,3,+" selects max compression (2), max memory (3), and a large set of models (+). A hex bitmap for this argument turns individual models on or off. Note 48 timings are for enwik8 only.
cmv 00.01.01 was released Jan. 10, 2016. It is compatible with 00.01.00 and does not change the compression ratio.
cmve 0.2.0 was released Nov. 28, 2017.
Program Options enwik8 enwik9 zip size Total Comp Deco Cmem Dmem Alg Note -------- ----------- ---------- ----------- --------- ----------- ---- ---- ---- ---- ---- ---- cmv 00.01.00 -m2,3,+ 18,218,283 150,226,739 77,404 x 150,304,143 285750 293090 2817 2817 CM 48,75 150,226,739 77,404 x 150,304,143 216000 2801 CM 75 -m2,3,0x03ededff 18,153,319 720000 ~3900 CM 75 cmv 00.01.01 -m2,3,0x03ed7dfb 18,122,372 149,357,765 77,404 x 149,435,169 426162 394855 3335 3335 CM 75 cmve 0.2.0 -m2,3,0x7fed7dfd 16,424,248 129,876,858 307,787 x 130,184,645 1140801 19963 CM 81
paq8hp12any was developed as a fork of the PAQ series of open source context mixing compressors by Alexander Rhatushnyak. It was forked from the paq8 series developed largely by Matt Mahoney, and uses a dictionary preprocessor (xml-wrt) originally developed by Przemyslaw Skibinski as a separate program and later integrated. All versions are optimized for the Hutter prize. Thus, they are tuned for enwik8. The 12 versions are described below in chronological order. They originally were located here (link broken) and can now be found here (as a zpaq archive) (as of Sept. 16, 2009). All programs are free, GPL open source, command line archivers. Most take a single option controlling memory usage.
Note: these programs are compressed with upack, which compresses better than upx. Some virus detectors give false alarms on all upack-compressed executables. The programs are not infected.
paq8hp1 by Alexander Rhatushnyak, 1945Z Aug. 21, 2006. It is a modification of paq8h using a custom dictionary tuned to enwik8 for the Hutter prize. Because the Hutter prize requires no external dictionaries, the dictionary is spliced into the .exe file during the build process. When run, it creates the dictionary as a temporary file. The program must be run in the current directory (not in your PATH or with an explicit path), or else it can't find this file. The unzipped paq8hp1.exe is 206,764 bytes. Decompression was verified for enwik8 (60730 ns/b for -8, 60660 ns/b for -7). enwik9 is pending.
paq8hp2 (source code) by Alexander Rhatushnyak, 0233Z Aug. 28, 2006 is an improved version of paq8hp1 submitted for the Hutter prize. paq8hp2.exe size is 205,276 bytes. It differs from paq8hp1 mainly in that the 43K word dictionary for 2-3 byte codes is sorted alphabetically. The 80 most frequent words, coded as 1 byte before compression, are grouped by syntactic type (pronoun, preposition, etc).
paq8hp3 (source code) by Alexander Rhatushnyak, released Aug. 29, 2006 is an improved version of paq8hp2 submitted for the Hutter prize on Sept. 3, 2006. The 80 dictionary words coded with 1 byte and 2560 words coded with 2 bytes are organized into semantically related groups or by common suffixes. The 40,960 words with 3 byte codes are sorted from the last character in reverse alphabetical order. paq8hp3.exe is 178,468 bytes unzipped. enwik9 decompression is not yet verified. For enwik8, decompression is verified with time 60300 ns/b compression, 60220 ns/b decompression.
paq8hp4 (source code) by Alexander Rhatushnyak, released and submitted for the Hutter prize on Sept. 10, 2006, is an improved version of paq8hp3. The dictionary is further organized into semantically related groups among 3-byte codes. The unzipped size of paq8hp4.exe is 206,336 bytes.
paq8hp5 (source code) by Alexander Rhatushnyak, released Sept. 20, 2006, is an improved version of paq8hp4, submitted for the Hutter prize on Sept. 25, 2006. The unzipped size of paq8hp5.exe is 174,616 bytes (in spite of a slightly larger dictionary). The dictionary size is optimized for enwik8; a larger dictionary would improve compression of enwik9. Decompression is verified for enwik8 only (-8 at 74640 ns/b). A Linux port of paq8hp5 is by Лъчезар Илиев Георгиев (Luchezar Georgiev), Oct 26, 2006 (mirror).
paq8hp6 (source code) by Alexander Rhatushnyak, released Oct. 29, 2006, is an improved version of paq8hp5. It was submitted as a Hutter prize candidate on Nov. 6, 2006. Unzipped paq8hp6.exe size is 170,400 bytes. The -8 option was not tested on enwik9 due to disk thrashing on my 2 GB PC. Compression was about 25% finished after 9 hours.
paq8hp7a by Alexander Rhatushnyak, Dec. 7, 2006, was intended to supercede paq8hp6 as a Hutter prize entry, then was withdrawn on Dec. 10, 2006 with the release of paq8hp7. Unzipped executable size is 151,664 bytes. -8 for enwik9 (but not enwik8) caused disk thrashing on my computer (2 GB, WinXP).
paq8hp7 (source code) by Alexander Rhatushnyak, Dec. 10, 2006, as a Hutter prize entry. Unzipped paq8hp7.exe size is 152,556 bytes.
paq8hp8 (source code) by Alexander Rasushnyak, Jan. 18, 2007, as a Hutter prize entry (replacing an incorrect version posted 2 days earlier). Unzipped size is 152,692 bytes. The dictionary is identical to paq8hp7.
paq8hp9 (mirror) (source code) by Alexander Rhatushnyak, Feb. 20, 2007, is a Hutter prize entry. Only the -7 option works. The unzipped size of paq8hp9.exe is 112,628 bytes.
paq8hp9any (Feb. 23, 2007) by Alexander Rhatushnyak is a paq8hp9 -7 compatible version with external dictionary where all options work. However the zipped program is larger and -8 was not tested due to disk thrashing, so results are unchanged.
paq8hp10 (Mar. 26, 2007) by Alexander Rhatushnyak was derived from paq8hp9 as a Hutter prize entry. The unzipped size is 103,224 bytes. Only the -7 option works.
paq8hp10any (source code), Mar. 31, 2007, by Alexander Rhatushnyak is archive compatible with paq8hp10 -7 but works with other memory options. When run, paq8hp10.exe and both dictionary files should be in the current directory. This program is not a Hutter prize entry.
paq8hp11 (mirror) by Alexander Rhatushnyak, Apr. 30, 2007, is a Hutter prize entry. paq8hp11.exe is 99,816 bytes. Like paq8hp10, it works only with the -7 option.
To compress: paq8hp11 -7 enwik8.paq8hp11 enwik8 To decompress: paq8hp11 enwik8.paq8hp11
paq8hp11any (source code) by Alexander Rhatushnyak, May 2, 2007, is a paq8hp11 variant that accepts any memory option. It was optimized for speed rather than size. It includes two dictionary files which must be present in the current directory when run, unlike paq8hp11 where the dictionary is self extracted. -8 selects 1850 MB memory. -7 produces the same archive as paq8hp11. Run speeds for -8 enwik8 are 76770+76820 ns/B.
paq8hp12 (mirror) by Alexander Rhatushnyak, May 14, 2007, is a Hutter prize entry. paq8hp12.exe size is 99,696 bytes. It works only with the -7 option like paq8hp11.
paq8hp12any (source code) by Alexander Rhatushnyak, May 20, 2007, is a paq8hp12 variant that accepts any memory option (like paq8hp11any). The -7 option produces an archive identical to that of paq8hp12.
paq8hp12any was updated on Jan. 9, 2009 to fix a compiler issue and add a 64 bit Linux version. Compressed file format was not changed. It was not retested.
Options select memory usage as shown in the table.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---- paq8hp1 -7 17,566,769 205,783 x 60170 60660 748 -8 17,397,023 142,477,977 205,783 x 142,683,760 63317 1595 paq8hp2 -7 17,390,490 204,557 x 62000 62330 747 -8 17,223,661 141,145,684 204,557 x 141,350,241 65323 1584 paq8hp3 -7 17,241,280 177,477 x 61360 59690 742 -8 17,085,021 139,905,045 177,477 x 140,082,522 63420 1586 paq8hp4 -7 17,039,173 198,525 x ~65000 65110 755 -8 16,889,237 138,188,695 198,525 x 138,387,220 67956 68120 1598 paq8hp5 -7 16,898,402 161,887 x 76300 77710 900 19 -8 16,761,044 137,017,311 161,887 x 137,179,198 ~85153 75162 1787 paq8hp6 -7 16,731,800 138,828,889 166,715 x 138,995,604 74953 73707 941 -8 16,568,451 135,281,289 166,715 x 135,448,004 60865 1807 21 paq8hp7a -7 16,592,672 137,441,743 150,678 x 137,592,421 79795 940 -8 16,431,239 150,678 x 76940 77600 1790 paq8hp7 -7 16,579,500 151,633 x 79620 79660 940 -8 16,417,646 133,835,408 151,633 x 133,987,041 66074 1850 21 paq8hp8 -7 16,528,353 151,711 x 79580 79970 940 -8 16,372,960 133,271,398 151,711 x 133,423,109 64639 1849 22 paq8hp9 -7 16,516,789 136,676,674 111,653 x 136,788,327 84529 85957 940 paq8hp10 -7 16,490,947 102,256 x 86720 88890 940 paq8hp10any -8 16,335,197 132,979,531 333,925 x 133,313,456 55639 1849 22 paq8hp11 -7 16,459,515 98,851 x 129540 128530 947 paq8hp11any -8 16,304,862 132,757,799 327,608 s 133,085,407 57503 1850 22 paq8hp12 -7 16,381,959 98,745 x 130820 131480 936 paq8hp12any -7 16,381,959 330,700 x 78860 76190 941 -8 16,230,028 132,045,026 330,700 x 132,375,726 56993 1850 22 -8 16,230,028 132,045,026 330,700 x 132,375,726 37660 37584 1850 41
paq8hp1 through paq8hp12 can be used as a preprocessor to other compressors by compressing with option -0. In the following tests on ppmonstr, options were tuned for the best possible compression of enwik8 with 2 GB memory (1.65 GB available under WinXP). The xml-wrt 2.0 options are -l0 -w -s -c -b255 -m100 -e2300 (level 0, turn off word containers, turn off space modeling, turn off containers, 255 MB buffer for dictionary, 100 MB buffer, 2300 word dictionary). The xml-wrt 3.0 options are -l0 -b255 -m255 -3 -s -e7000 (-3 = optimize for PPM).
xml-wrt prepends the dictionary to its output. To make the comparison fair, the compressed size of the dictionary must be added. This is done in two ways, first by compressing the preprocessed text and dictionary and adding the compressed sizes, and second by prepending the dictionary to the preprocessed text before compression. The first method compresses about 1-2 KB smaller.
The uncompressed size of each dictionary for paq8hp1 through paq8hp4 is 398,210 bytes. They contain identical words, but in different order. The first two dictionaries are identical. They compress smaller because they are sorted alphabetically. The dictionary for paq8hp5 is 411,681 bytes. It contains all of the words in the first 4 dictionaries plus 1280 new words (44,880 total).
Preprocessor Compressor enwik8 dict total dict+enwik8 ------------ ---------- ---------- ------- ---------- --------- paq8hp1 -0 | ppmonstr J -m1650 -o64 18,322,077 81,190 18,403,267 18,403,991 paq8hp2 -0 | ppmonstr J -m1650 -o64 18,266,424 81,190 18,347,614 18,349,587 paq8hp3 -0 | ppmonstr J -m1650 -o64 18,197,797 107,583 18,305,380 18,306,690 paq8hp4 -0 | ppmonstr J -m1650 -o64 18,170,944 107,590 18,278,534 18,280,098 paq8hp5 -0 | ppmonstr J -m1650 -o64 18,154,921 111,935 18,266,856 18,267,556 xml-wrt 2.0 | ppmonstr J -m1650 -o64 18,625,624 xml-wrt 3.0 | ppmonstr J -m1650 -o64 18,494,374 (none) ppmonstr J -m1650 -o16 19,062,555 ppmonstr J -m1650 -o32 19,084,964 ppmonstr J -m1650 -o64 19,098,634
The transform done by paq8hp1 through paq8hp5 is based on WRT by Przemyslaw Skibinski, which first appeared in PAsQDa and paqar, and later in paq8g and xml-wrt. The steps are as follows:
emma v0.1.3 is a free, closed source file compressor for 32 bit Windows by mpais, Mar. 8, 2016. It uses context mixing. It has a GUI-only interface to select compression options. For testing, all settings were for maximum compression as follows: Memory usage 512 Mb, maximum order 9, ring buffer 32 Mb, probability refinement level 3, mixing complexity insane, adaptive learning rate on, fast mode on long matches off, ludicrous complexity mode on, match model on, 32 Mb, high complexity; text model on, 128 Mb, high; sparse model on, 16 Mb, high; sparse model on, 16 Mb, high; indirect model on, 16 Mb, high; x86/64 model on, 64 Mb, insane; image models on, 80 Mb, high; audio models on, 32 Mb, high; record model on, 16 Mb, high; distance model on, 8 Mb; JPEG model on, 40 Mb, high; GIF model on, 32 Mb, high; executable code (x86/64) transform on; process conditional jumps on; colorspace (RGB) on; delta coding on; dictionaries: English on, Spanish off, Italian off, French off, Portugese off.
emma v0.1.4 was released Mar. 13, 2016. For testing, the text model was increased to 256 MB. A DMC model (8 MB) was added. The non-text related models were turned off: x86, image, audio, JPEG, GIF. All transforms (x86, RGB, delta) were turned off.
emma 0.1.6 ( discussion) was released Mar. 27, 2016. It was tested by splitting enwik9 into parts using hsplit to move the highly compressible middle part to the end. Then the reordered file was then processed using drt dictionary processing (see lpaq9m) instead of emma's built in dictionary and then compressed with emma with maximum compression and memory options (like below) except that dictionary processing was turned off. The decompressor size includes drt.exe, lpqdict0.dic, hsplit.exe and a BAT file to restore the original order, all compressed with emma, then those files plus emma.exe (without dictionaries) compressed into a zip archive. Specifically, enwik9 was prepared:
fsplit32 enwik9 en1 586000000 fsplit32 en1.1 en2 480000000 fsplit32 en2.1 en3 424000000 copy /b en3.1+en1.2+en3.2+en2.2 enwik9o del en1.1 del en1.2 del en2.1 del en2.2 del en3.1 del en3.2 drt enwik9o enwik9o.drt del enwik9obefore compression with emma, then restored after decompression:
drt enwik9o.drt enwik9o d fsplit32 enwik9o en1o 894000000 del enwik9o fsplit32 en1o.1 en2o 838000000 fsplit32 en2o.1 en3o 424000000 copy /b en3o.1+en2o.2+en1o.2+en3o.2 enwik9 del en1o.1 del en1o.2 del en2o.1 del en2o.2 del en3o.1 del en3o.2The command hsplit input output N means produce output.1, output.2, etc. each of size N bytes.
emma 0.1.12 was released July 10, 2016. There are 32 and 64 bit versions. The 64 bit version can use more memory. Settings were as follows:
x64 x86 Memory 2048 MB 512 MB Max order 10 9 Ring buffer size 128 MB 32 MB Probability refinement level 3 level 3 Mixing complexity insane insane Adaptive learning rate on off Fast mode long matches off off Ludicrous complexity on on Match model 128 MB, high 32 MB, high Text model 1024 MB, high 256 MB, high Sparse model 64 MB, high 16 MB, high Indirect model 64 MB, high 16 MB, high 86/x64 model off off Image models off off Audio models off off Record model 64 MB, high 16 MB, high Distance model 32 MB 8 MB DMC model 32 MB 8 MB JPEG model off off GIF model off off XML model 16 MB 4 MB RAW models off off Transforms exec code off off Colerspace RGB off off Delta coding off off Dictionaries English English
emma 0.1.22 was released Feb. 12, 2017. Settings: all settings = MAX, eceept: image and audio models = off, use fast mode on long matches = off, xml=on, x86model=off, x86 exe code = off, delta coding = off, dictionary = off, ppmd memory = 1024, ppmd order = 14
emma 1.23 was released Aug. 29, 2017. It uses ppmd_mod v3a by Shelwein and is preprocessed with DRT. EMMA 1.23 settings: all settings = MAX, eceept: image and audio models = off, use fast mode on long matches = off, xml=on, x86model=off, x86 exe code = off, delta coding = off, dictionary = off, ppmd memory = 1024, ppmd order = 14
Program enwik8 enwik9 program size total Comp Decomp Mem Alg Note ------- ---------- ---------- ------------ ----------- ----- ------ ---- --- ---- emma 0.1.3 17,971,713 149,864,553 1,844,505 x 151,709,068 110458 113839 1336 CM 77 emma 0.1.4 17,865,328 148,887,824 1,848,033 x 150,735,857 58141 980 CM 78 drt|emma 0.1.16 x64 16,855,079 136,393,547 1,257,839 x 137,651,386 64341 62102 3800 CM 77 emma 0.1.12 x86 17,824,974 148,403,034 1,878,971 x 150,282,005 62639 986 CM 78 emma 0.1.12 x64 17,468,937 142,416,812 2,105,286 x 144,522,098 95997 3688 CM 78 emma 0.1.22 16,679,420 135,169,967 1,302,363 xd 136,472,330 86187 3824 CM 81 drt|emma 1.23 16,523,517 134,164,521 1,358,251 xd 135,522,772 73006 67097 3800 CM 81
A ZPAQ archive is organized into independently compressed blocks. Each block is divided into one or more segments which must be decompressed in sequence. Each segment represents a file or a part of a file. The standard supports both archivers and single file compressors. In the case of a compressor, no filenames are stored in the segment headers, and all the blocks and segments are concatenated to a single output file specified by the user.
ZPAQ uses a streaming format that can be read or written in a single pass. The arithmetic coded data is designed so that the end of a segment can be found by scanning quickly without decoding. There is no central directory information to update when blocks are added, removed, or reordered.
The ZPAQ standard requires that the decompression algorithm be described in the block headers. The header describes a collection of bitwise predictive models based loosely on PAQ components, a program to compute the bytewise contexts for each model, and a second program to perform arbitrary postprocessing on the output data. The two programs are written in an interpreted bytecode language called ZPAQL.
A ZPAQ model specifies a list of 1 to 255 components. Each component outputs a prediction or probability that the next bit will be a 1. Each component may receive as input a computed 32-bit context and the output predictions of earlier components on the list. The last component's prediction is fed to an arithmetic coder to encode or decode the next bit. The components are as follows:
There are two ZPAQL virtual machines, one (HCOMP) to compute contexts, and one (PCOMP) to postprocess the decoded data. Each program is called once per decoded byte with that byte as input. A ZPAQL machine has the following state:
zpaq 1.03 takes as input a configuration file which describes the arrangement of components, their parameters, and the ZPAQL program HCOMP written one token per byte in a C-like syntax (e.g. "A=B" to assign B to A). PCOMP is not specified because in general the preprocessing step by the compressor is different (and usually more complex) than the postprocessing step. Instead, zpaq 1.03 provides the option of two built-in preprocessors, LZP and E8E9. If selected, the preprocessing is done in C++ by the compressor, and the compressor generates ZPAQL code to perform the inverse transform and insert it into the archive block header. (PCOMP is actually appended to the beginning of the input data and compressed with it. HCOMP is not compressed).
E8E9 is used to improve compression of 32 bit x86 executable files. It replaces the 32 bit relative address after a CALL or JMP (0xE8 or 0xE9) x86 instruction by adding the offset from the beginning of the file. This improves compression because often there are several calls to the same target. PCOMP performs the inverse transform in ZPAQL by subtracting the offset.
LZP encodes long string matches as an escape byte and length byte. The decompresser maintains a rolling context hash which indexes a pointer table (the H array) into the output buffer (the M array) pointing to the previous context match. If an escape is present, then the indicated number of bytes are copied from the previous context match. In zpaq 1.03, the user can specify the sizes of M and H, the hash multiplier (effectively choosing the context length), the value to use as the escape byte (preferably occurring rarely in the input), and minimum match length. Escape bytes in the input are encoded as an escaped 0 length.
zpaq 1.03 is distributed with three configuration files, min.cfg (for speed), mid.cfg (the default), and max.cfg (for good compression). However, the user can also write their own config files.
o0.cfg, o1.cfg, and o2.cfg are order 0, 1, and 2 models with a single CM and direct context lookup with no hashing. o0 is equivalent to fpaq0. In each of the models the asymptotic learning rate was tuned for maximum compression. Other values are given as comments in the sources. The CM uses 2KB, 512KB and 128MB respectively.
min.cfg uses LZP preprocessing with a minimum match length of 3 and an order 4 context hash, followed by compression by single CM with an order 3 context and 512K entries. The LZP has a 1 MB output buffer and 256K index. It uses 4 MB memory.
mid.cfg (the default) does no preprocessing. It has an order 0 ICM, a chain of ISSE with context orders 1 through 5, each taking the previous ISSE as input, a MATCH with an order 7 context, and a final MIX with an order 1 context taking input from all other models. It uses 111 MB memory.
max.cfg does no preprocessing. It has 21 components: an order 0 ICM, a chain of order 1, 2, 3, 4, 5, 7 ISSE, an order 8 MATCH, a wordwise order 0-1 ICM-ISSE chain (for text), sparse order 1 ICM with gaps of 1, 2, and 3, a partially masked order 2 ICM with a gap of 216 for CCITT images (calgary/pic), order 0 and 1 mixers taking a CONST and all previous components as input and averaged together with a context free MIX2, followed by a chain of order 0 and 1 SSE each partially bypassed by a context free and order 0 MIX2, and a final context free MIX of all other components. The two wordwise contexts depend on the current and previous case insensitive sequences of letters in the range a-z. It uses 278 MB memory.
max3.cfg is a variation of max.cfg by Jan Ondrus (Sept. 10, 2009) using 550 MB memory and without a CCITT model.
max4.cfg is a variation of max3.cfg (Sept. 15, 2009) using 1465 MB memory.
drt is the dictionary preprocessor from lpaq9m by Alexander Rasushnyak. The results include the dictionary file lpqdict0.dic compressed from 465,210 to 88,759 bytes in 8 seconds as a separate archive with max4.cfg and decompressed in 7 seconds, and drt.exe with a size of 15,548 bytes (whether uncompressed or as a zip file) with 38 seconds to encode enwik9 and 38 seconds to decode.
max_enwik9.cfg is a variation of max.cfg by Mike Russell, Sept. 11, 2009. It adds 5 more models for higher order contexts using an ISSE chain after the first order 5 mixer.
max_enwik9drt.cfg is a variation of max_enwik9.cfg, Sept. 18, 2009, modified to define word contexts for ASCII range 65-255 instead of A-Z,a-z because DRT encodes words using bytes in the range 128-255. The compressed size of lpqdict0.dic is 86810 bytes, 12+9 sec, compressed separately and added to the compressed sizes.
zpipe 1.00 is a ZPAQ compatible streaming file compressor that compresses or decompresses from standard input to standard output. It takes no options. It compresses equivalently to mid.cfg without storing a filename or comment. The decompresser outputs the contents of archives to a single file by concatenation.
bwt_j2.cfg implements an inverse BWT transform. It was writen by Jan Ondrus, Oct. 6, 2009. The forward transform is implemented by an external preprocessor, bwtpre (included above) by Matt Mahoney, Oct. 6, 2009. bwtpre is based on BBB fast mode compression but does not itself compress. The argument ",18" tells bwt_j2.cfg to use a block size of 210+18-256 bytes. Memory usage is 5x blocksize for both the preprocessor and postprocessor, plus 100 MB for the model. The ability of config files to call external preprocessors was added to zpaq v1.05 on Sept. 28, 2009. The ability to pass arguments was added to zpaq v1.07 on Oct. 2, 2009.
zpaq v1.08 (Oct. 14, 2009) adds the capability to compile ZPAQL configuration files and corresponding archive headers to C++ and link to a copy of itself to speed up compression and decompression. The program first looks for an optimized version of the program, writes and compiles it if needed, then runs it to compress or decompress. Some tests are shown for speed comparison. max.cfg was modified to use less memory. The arguments to min.cfg, mid.cfg, and max.cfg have the effect of improving compression at the cost of doubling memory for each increment.
bwt_slowmode1_1GB_block.cfg implements slow mode BWT transform using 1.25x blocksize memory based on BBB. The inverse transform was re-implemented in ZPAQL by Jan Ondrus, Oct. 15, 2009.
zpaq v1.09 is mainly a Linux port of v1.08 with some cosmetic improvements. Times for obwt_j2.cfg,18 are shown for comparison to v1.07 without optimization. Memory usage is 1838 MB for compression (includes preprocessor) and 1443 MB for decompression.
The c command followed by the name of a configuration file creates a new archive using that file. By default the archive header includes the file name (6 bytes), size (10 bytes), and SHA1 checksum (20 bytes). There are options to omit these and save 36 bytes. The "oc" command in zpaq v1.08 optimizes for speed.
zp 1.00 is a ZPAQ compatible archiver by Matt Mahoney, May 7, 2010. It is designed to have fewer options so it is easier to use. It has 3 compression levels: 1=fast, 2=mid, 3=max. It uses compiled ZPAQL code (like zpaq oc/ox) but without requiring an external C++ compiler to be installed. It automatically detects when an archive is compressed with one of these three models and decompresses with compiled code. Otherwise, it will decompress all other ZPAQ compatible archives with slower, interpreted code. Levels 2 and 3 are the same as zpaq mid.cfg and max.cfg. Only level 1 (fast) was tested because it uses a new model, fast.cfg, an ICM chain of length 2 with order 2 and 4 contexts. It is equivalent to compressing with zpaq ocfast.cfg.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- zpaq 1.03 co0.cfg 61,217,687 620,040,242 14,317 xd 620,054,559 441 453 0.4 o0 26 co1.cfg 46,083,596 454,040,416 14,317 xd 454,054,733 459 480 0.6 o1 26 co2.cfg 36,694,483 346,551,263 14,317 xd 346,565,580 557 560 134 o2 26 cmin.cfg 33,460,947 294,281,789 14,317 xd 294,296,106 438 513 4 LZP 26 cmid.cfg 20,941,558 180,279,221 14,317 xd 180,293,538 3521 3652 111 CM 26 cmax.cfg 19,412,353 165,191,085 14,317 xd 165,205,402 12211 12204 278 CM 26 cmax3.cfg 19,179,311 161,604,379 14,317 xd 161,618,696 14108 13609 550 CM 26 cmax4.cfg 18,986,507 157,246,349 14,317 xd 157,260,666 14061 13077 1465 CM 26 cmax_enwik9.cfg 18,238,435 149,376,058 14,317 xd 149,390,375 11961 2002 CM 32 drt|zpaq 1.03 cmax4.cfg 18,400,773 149,761,125 29,865 xd 149,790,990 8663 8547 1465 CM 26 cmax_enwik9drt.cfg 18,022,167 146,078,502 29,865 xd 146,108,367 11494 11614 1952 CM 26 zpipe 1.00 20,941,543 180,279,205 13,421 x 180,292,626 3540 3480 111 CM 26 zpaq 1.07 cbwt_j2.cfg,18 20,756,888 174,171,969 13,421 x 174,185,390 5593 4347 1838 BWT 26 zpaq 1.08 ocbwt_slowmodel_1GB_block.cfg 20,756,996 163,565,006 29,153 x 163,594,159 7957 3875 1443 BWT 26 oco0.cfg 61,217,687 335 407 0.4 o0 26 ocmin.cfg 33,460,960 414 383 4 LZP 26 ocmid.cfg 20,941,558 2392 2456 111 CM 26 ocmax.cfg 19,448,650 6569 6641 246 CM 26 ocmax.cfg,3 18,977,961 6667 6640 1861 CM 26 zpaq 1.09 ocbwt_j2.cfg,18 20,756,883 174,171,965 31,744 x 171,203,709 4529 1847 1838 BWT 26 zp 1.00 c1 24,837,469 222,310,430 26,815 s 222,337,245 688 776 37 CM 26 587 688 44
pzpaq 0.01 (a predecessor to zp 1.02) is a free, open source file compressor and archiver by Matt Mahoney, Jan. 21, 2011. It uses a ZPAQ compatible format with speed optimizations for the 3 default compression levels supported by libzpaq, zpaq, and zpipe. It supports parallel compression and decompression by dividing the input into blocks which are compressed or decompressed at the same time in separate threads, writing the result to temporary files, and then comcatenating them when done. For compression with N threads, the input is divided into N blocks of equal size by default, although a different block size can be specified. Larger blocks make compression better but reduce the number of threads that can run at the same time. Using more threads also increases the memory required. pzpaq can also compress or decompress multiple files at once to separate archives or pack them into a solid archive or an archive with the packed files split across blocks within the archive.
The version 0.01 distribution includes a 32 bit Windows executable and source code to compile for Windows or Linux. For Windows, the code must be linked with Pthreads-Win32 and pthreadGC2.dll is required at run time. The program size was calculated from the source code (including libzpaq) required for Linux, which has pthreads installed by default and is not included in the size.
The test results shown below are for 2 machines, a 2.67 GHz Intel Core i7 M620 with 2 cores and 2 hyperthreads per core, running 64 bit Linux (note 48), and a 2.0 GHz Intel T3200 with 2 cores without hyperthreading running 32 bit Windows (note 26). The Linux version was compiled with g++ 4.4.4 -O3 -s -march=native -DNDEBUG. The Windows version used the distributed pzpaq.exe and pthreadGC2.dll. It was compiled with g++ 4.5.0 -O2 -s -march=pentiumpro -fomit-frame-pointer. Times shown are wall (real) times, not process times, in nanoseconds per byte.
We observe the normal 3 way tradeoff between speed, memory, and compression. Compression levels -1, -2, and -3 require 38 MB, 112 MB, and 247 MB per thread respectively. The default is -2. -t selects the number of threads. The default is -t2. -b selects the block size. The default is the input size divided by the number of threads. The -m option limits memory usage in MB by reducing -t. The default is -m500. Selecting larger -m than required has no effect on compression, speed, or actual memory used. -m is only required with -3 -t3 or higher.
C/D time C/D time Lev Thr Block Memory enwik8 Note 48 Note 26 ------------------------- ---------- ----------- ----------- -1 -t2 -b1000000 -m76 28,176,221 471 -1 -t2 -b2500000 -m76 26,915,416 443 -1 -t2 -b5000000 -m76 26,236,689 436 -1 -t2 -b10000000 -m76 25,728,498 429 -1 -t4 -b25000000 -m152 25,253,629 210 220 -1 -t3 -b33333334 -m114 25,144,587 220 240 -1 -t2 -b50000000 -m76 25,009,236 240 290 410 430 -1 -t1 -b100000000 -m38 24,837,482 420 470 750 800 -2 -t2 -b1000000 -m224 24,582,373 1440 -2 -t2 -b2500000 -m224 23,374,191 1396 -2 -t2 -b5000000 -m224 22,644,738 1417 -2 -t2 -b10000000 -m224 22,044,838 1430 -2 -t2 -b25000000 -m224 21,438,679 1382 -2 -t4 -b25000000 -m448 21,438,679 720 730 -2 -t3 -b33333334 -m336 21,303,705 790 820 -2 -t2 -b50000000 -m224 21,138,877 950 980 1300 1310 -2 -t1 -b100000000 -m112 20,941,571 1510 1560 2350 2330 -3 -t2 -b1000000 -m494 23,281,943 4142 -3 -t2 -b2500000 -m494 22,105,128 3896 -3 -t2 -b5000000 -m494 21,371,902 3866 -3 -t2 -b10000000 -m494 20,745,064 3854 -3 -t2 -b25000000 -m494 20,073,978 3816 -3 -t4 -b25000000 -m988 20,073,978 1900 1950 -3 -t3 -b33333334 -m741 19,914,412 2070 2120 -3 -t2 -b50000000 -m494 19,710,450 2180 2250 3670 3990 -3 -t1 -b100000000 -m247 19,448,663 3780 3910 6080 6200 C/D time C/D time Lev Thr Block Memory enwik9 Note 48 Note 26 ------------------------- ----------- ----------- ----------- -1 -t2 -b1000000 -m76 254,931,717 582 -1 -t2 -b10000000 -m76 232,278,737 425 -1 -t2 -b100000000 -m76 224,233,690 392 -1 -t2 -b250000000 -m76 223,043,964 393 -1 -t4 -b250000000 -m152 223,043,964 198 223 -1 -t3 -b333333334 -m114 222,789,971 224 254 -1 -t2 -b500000000 -m76 222,544,698 236 276 408 556 -1 -t1 -b1000000000 -m38 222,310,443 410 470 758 800 -2 -t2 -b1000000 -m224 216,322,292 1377 -2 -t2 -b10000000 -m224 192,436,071 1286 -2 -t2 -b100000000 -m224 182,293,069 1275 -2 -t2 -b250000000 -m224 180,995,559 1278 -2 -t4 -b250000000 -m448 180,995,559 710 742 -2 -t3 -b333333334 -m336 180,716,954 768 811 -2 -t2 -b500000000 -m224 180,516,414 854 881 1275 -2 -t1 -b1000000000 -m112 180,279,234 1487 1532 2231 -3 -t2 -b1000000 -m494 203,976,295 3824 -3 -t2 -b10000000 -m494 180,499,077 3657 -3 -t2 -b100000000 -m494 168,839,648 3611 -3 -t2 -b250000000 -m494 167,036,071 3635 -3 -t4 -b250000000 -m988 167,036,071 1881 1926 -3 -t3 -b333333334 -m741 166,567,322 2025 2158 -3 -t2 -b500000000 -m494 166,324,415 2172 2236 3599 -3 -t1 -b1000000000 -m247 165,887,518 3708 3846 5989zp 1.02 is a successor to pzpaq, which was considered experimental. It adds two new BWT compression modes which replace the "fast" (-1) model. Option -m1 selects the faster BWT mode (bwtrle1), which consists of right-context sorting (using libdivsufsoft by Yuta Mori), RLE encoding, and a single order 0 ICM with the RLE state (literal or count) as context. The BWT output is run length encoded by replacing runs of 2 to 257 identical bytes with 2 bytes and a count. The ICM maps the context to a bit history and then to a bit prediction, which is adjusted after coding to reduce the prediction error.
Option -m2 selects the better BWT mode (bwt2), which drops the RLE step and uses an order 0-1 ISSE chain. The order-1 ISSE adjusts the order-0 ICM prediction by mixing it in the logistic domain with a constant, such that the pair of weights is selected by an 8-bit bit history, which is selected by an order 1 context of the BWT output. After coding, the mixing weights are adjusted to reduce the prediction error.
Options -m3 and -m4 select the "mid" and "max" modes, the same as -4 and -5 respectively in pzpaq. The option -bN selects a block size of N*2^20 - 256 bytes. Memory usage per thread for the two BWT modes is 5 times the block size after rounding up to a power of 2. The default is -b32 which uses 160 MB per thread for -m1 and -m2. Memory usage for -m3 and -m4 is not affected by block size. Usage is 111 MB and 246 MB per thread for -m3 and -m4 respectively.
Other changes: there is no longer an option to limit memory. The default number of threads (-t option) is the number of cores. There is no solid mode compression because BWT requires that each block contain only one whole or part of a file. There is a separate decompresser, unzp, which is optimized for fast, mid, max, bwtrle1, and bwt2 modes, and can be configured to optimize for other models by generating, compiling, linking, and running C++ code for an optimized version of itself. Compressed sizes are based on the unzp source code (37,967 bytes).
zpaq 4.00 was released Nov. 13, 2011. It uses libzpaq v4.00, which internally translates ZPAQL into just-in-time (JIT) x86-32 or x86-64, which runs about as fast as the previous version that translated ZPAQL to C++ and compiled it. Unlike the earlier version, it correctly handles all legal ZPAQL, such as jumps into the middle of a 2 byte instruction, such as occurs in max_enwik9.cfg. Like zp 1.02, it uses multi-threading and the same build-in compression levels -m1 through -m4.
Results are shown below for a 4 GB 2.66 GHz Core I7 M620 (note 40), which has 2 cores with 2 hyperthreads each. Run under Ubuntu 64 bit Linux. Compression and decompression times (wall times, ns/byte) are shown for 1 through 4 threads (-t1 through -t4) as the compression method (-m) and block size (-b) are varied. max_enwik9 runs in one thread in a single block.
Compressor Options enwik8 enwik9 -t1 -t2 -t3 -t4 MB/thread ---------- -------- ---------- ----------- --------- --------- --------- --------- ---------- zp 1.02 -m1 -b32 24,091,153 210,224,876 264 313 144 184 131 170 120 165 160 -m1 -b128 22,823,452 197,571,474 264 335 163 208 137 187 136 179 640 -m1 -b256 22,823,452 191,741,553 167 218 1280 -m2 -b32 22,440,353 195,887,789 446 514 259 304 237 274 231 267 160 -m2 -b128 21,246,043 184,023,690 467 543 291 343 250 295 248 294 640 -m2 -b256 21,246,043 178,551,919 304 351 1280 -m3 -b32 21,301,940 185,584,854 1420 1478 805 856 760 790 713 745 111 -m3 -b128 20,941,571 181,908,375 1430 1491 851 897 772 823 723 758 111 -m3 -b1024 20,941,571 180,279,234 1446 1503 111 -m4 -b32 19,912,920 172,989,918 3567 3695 2075 2145 1966 2011 1868 1906 246 -m4 -b128 19,448,663 168,312,889 3578 3706 2156 2234 1984 2043 1875 1925 246 -m4 -b1024 19,448,663 165,887,518 3597 3732 246 Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------------ ---------- ----------- ----------- ----------- ----- ----- --- --- ---- zpaq 4.00 -mmax_enwik9 18,238,435 149,376,058 66,958 s 149,440,016 6327 6528 2002 CM 48
zpaq v6.12, Oct. 19, 2012, is a journaling, deduplicating, incremental archiver. These features were added in zpaq v6.00 on Sept. 26, 2012. It implements the level 2 ZPAQ standard introduced with libzpaq v5.00 on Feb. 1, 2012. The level 2 standard allows for uncompressed (but possibly pre/post-processsed) data. The format is described in the ZPAQ specification v2.01.
zpaq v6.12 is designed for large backups. It will compress 100 GB to an external drive in a few hours, then perform daily incremental backups of files whose dates have changed in a few minutes. It recursively traverses directories, storing last-modified dates and attributes of added files.
A journaling archive is append-only. When a journaling archive is updated, it keeps both the old and new versions of each file or directory. The old version can be extracted by specifying a dated version, and any later updates are ignored.
Input is deduplicated before compression by dividing input files into fragments averaging 64 KB on content-dependent boundaries that move when data is inserted or removed. The archive stores fragment SHA-1 hashes and stores any fragment with a matching hash as a pointer to an existing fragment. Any remaining fragments are packed into 16 MB blocks in memory and compressed by multiple threads in parallel to memory buffers before being appended to the archive. After compression is completed, the fragment sizes and hashes are appended, and then a list of index updates in separately compressed blocks. Each update is either a deletion (filename only) or an update (filename, date, attributes, and list of fragment pointers).
An update is performed as a transaction by first appending a temporary header, then the compressed data and index, and then finally going back and updating the header to store the compressed data size so that it can be skipped over when listing the archive contents or preparing a list of files to add or extract. If compression is interrupted or an error occurs, then the temporary header is not updated. If zpaq encounters a temporary header then it assumes that any data following it is corrupted and ignores it during extraction or listing, and overwrites it during the next update.
zpaq also has features to summarize the contents of archives containing millions of files, show update history and version dates, and compare and extract individual files and directories and rename them. Archives can be encrypted.
The deduplication algorithm uses a rolling hash of the input that depends on the last 32 bytes that are not predicted in an order-1 context. Missed predictions (from a 256 byte table) are counted as a heuristic to guess whether a block can be compressed. If not, then it is stored without compression as a speed optimization. There are 4 compression levels (-method 1 through 4). The threshold for compressing a block is 1/16, 1/32, 1/64, and 1/128 of bytes predicted by the order 1 model, respectively. Like earlier versions of zpaq, it also accepts configuration files and external preprocessors. These are always compressed.
The journaling format is not compatible with zpaq versions prior to 6.00. Older versions would decompress a journaling archive to a set of jDC* files that could in theory reconstruct the data. To support older versions, there are three additional modes: streaming, solid, and tiny. In streaming mode, each file is compressed in parallel in a separate block, and large files are split into 16 MB blocks. In solid mode, all files are compressed to a single block in a single thread. Tiny mode is like solid mode except that comments (uncompressed sizes), checksums, and header locator tags (for error recovery) are not stored, saving a few bytes each. None of these modes support journaling, incremental backup, or deduplication, and do not save file attributes or empty directories. An update appends to an archive without checking whether the files have been added before.
There are 4 built in methods. Method 1 is equivalent to "lazy" level 3. It is LZ77 using variable length codes to represent the lengths of literal byte strings or the length and offset of matches to earlier occurrences of the same string in a 16 MB output block. Matches are found by indexing a hash of the next 4 bytes in the input buffer into a table of size 4M which is grouped into 512K buckets of 8 pointers each. The longest match is coded, provided the length is at least 4, or 5 if the offset is greater than 64K and the last output was a literal. Ties are broken by favoring the smaller offset. Bucket elements are selected for replacement using the low 3 bits of the output count.
Literal lengths are coded using "marked binary" Elias gamma codes, where the leading 1 bit of the number is dropped and a 1 bit is inserted in front of the remaining bits and a 0 marks the end. For example, 1100 is coded as 1,1,1,0,1,0,0. Matches are coded as a length and an offset. The length is at least 4. All but the last 2 bits are coded as a marked binary. The number of match bits is given in the first 5 bits of the code. If the code starts with 00, then a literal length and string of literal follow. Otherwise the 5 bits code a number from 0 to 23, and that number of bits, with an implied leading 1 give the offset.
The codes are not compressed further. They are stored in the ZPAQ level 2 format, consisting of a sequence of sub-blocks each preceded by a 4 byte header giving the sub-block size.
Method 2 is also LZ77, but the codes are byte aligned and context modeled rather than coded directly. It also searches 4 order-7 context hashes and 4 order-4 hashes, rather than 8 order-4 hashes like method 1. Method 2 first codes as follows, according to the high 2 bits of the first byte:
00 = literal of length 1..64, followed by uncompressed bytes. 01 = match of length 4..11 and offset 1..2048. 10 = match of length 1..64 and offset of 1..65536. 11 = match of length 1..64 and offset of 1..16777216.These codes are arithmetic coded using an indirect context model. The context depends on the parse state and in the case of literals, on the previous byte. An indirect context model maps a context into a bit history (represented as an 8 bit state) and then to a bit prediction. The model is updated by adjusting the prediction to reduce the error by 0.1%. A bit history represents a bounded pair of bit counts (n0,n1) and the value of the most recent bit. The bounds for (n0,n1) and (n1,n0) are (20,0), (48,1), (15,2), (8,3), (6,4), (5,5).
Method 3 uses a Burrows-Wheeler transform (BWT) using libdivsufsort-lite v2.0. This is equivalent to -m2 in older zpaq versions. The input bytes are sorted by their right contexts and compressed using an order 0-1 ICM-ISSE chain. The order 0 ICM (indirect context model) works as in method 2, taking only the previous bits of the current byte (MSB first) as context. The prediction is adjusted by an order-1 indirect secondary symbol estimator (ISSE). An ISSE maps its context (the previous byte and the leading bits of the current byte) to a bit history, and the history selects a pair of mixing weights to compute the weighted average of the constant 1 and the ICM output in the logistic domain, log(p/(1-p)). The output is converted back to linear, and the two weights are updated to reduce the prediction error in favor of the better model. In other words, the output is:
p' := 1/(1 + exp(-w1*1 - w2*log(p/(1-p))))and after the bit is arithmetic coded, the weights w1 and w2 are updated:
w1 := w1 + 1 * 0.001 * (bit - p') w2 := w2 + log(p/(1-p)) * 0.001 * (bit - p')
Method 4 is equivalent to mid.cfg or -m3 in older zpaq versions. It directly models the data using an order 0-5 ICM-ISSE chain, an order 7 match model, and an order 1 mixer which produces the bit prediction by mixing the predictions of all other components. The 6 components in the chain each mix the next lower order prediction using a hash of the next higher order context to select a bit history for that context, which selects the mixing weights. A match model has a 16 MB history buffer and a 4M hash table of the previous occurrence of the current context. If a match is found, it predicts the bit that followed the match with probability 1 - 1/(length in bits). The outputs of all 7 models are then mixed as with an ISSE except with a vector of 7 weights selected by an order 1 (16 bit) context, and with a faster weight update rate of about 0.01.
With method 4 you can give an argument like "-method 4 1" to double the memory allocated to the components to improve compression. The same extra memory is needed to decompress. The default is 111 MB per thread. An argument n multiplies memory usage by 2^n. n can be negative.
Methods 1, 2, and 3 only work in journaling and streaming mode, since they have a 16 MB block size limit. Method 4 and configuration files work in all modes.
The following tests are on a 2.0 GHz T3200 with 2 cores. zpaq will automatically detect the number of cores and use the same number of compression or decompression threads, although this can be overridden.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------------ ---------- ----------- ----------- ----------- ----- ----- --- --- ---- zpaq 6.12 -method 1 37,397,857 328,974,375 104,067 s 329,078,442 93 53 152 LZ77 26 -method 1 -streaming 37,359,931 328,618,875 104,067 s 328,722,942 85 28 151 LZ77 26 -method 2 31,765,035 281,184,939 104,067 s 281,289,006 196 108 153 LZ77 26 -method 2 -streaming 31,730,884 218 126 151 LZ77 26 -method 3 23,341,562 203,365,453 104,067 s 203,469,520 429 369 238 BWT 26 -method 3 -streaming 23,328,888 425 375 238 BWT 26 -method 4 21,768,810 1403 1371 299 CM 26 -method 4 -streaming 21,744,770 1403 1356 299 CM 26 -method 4 -solid 20,941,591 2036 2056 109 CM 26 -method 4 1 -solid 20,740,920 2338 2197 216 CM 26 -method 4 4 -solid 20,581,270 2356 2289 1482 CM 26 -method 4 4 -tiny 20,581,208 173,028,477 104,067 s 173,132,544 2107 2230 1654 CM 26
zpaq v6.19, Jan. 23, 2013, moves the -solid and -tiny modes into a separate program, zpaqd, and eliminates -streaming. It adds 5 more compression levels (0 through 9). -method 5 is max.cfg, a 22 component CM with some of the component sizes reduced to use about 225 MB per thread. -methods 6 through 9 each double the memory size (450 MB to 1.8 GB) and block size (32 MB to 256 MB). All levels except 0 (store uncompressed) have an E8E9 pre/post-processor. -methods 0 through 4 are unchanged.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------------ ---------- ----------- ----------- ----------- ----- ----- --- --- ---- zpaq 6.19 -method 0 -threads 2 100,050,464 37 42 169 copy 26 -method 1 -threads 2 37,398,697 143 61 225 LZ77 26 -method 2 -threads 2 31,766,023 294 185 225 LZ77 26 -method 3 -threads 2 23,342,327 635 548 322 BWT 26 -method 4 -threads 2 21,770,084 1319 1331 378 CM 26 -method 5 -threads 2 20,491,832 3778 3773 563 CM 26 -method 6 -threads 2 19,901,321 4446 4615 991 CM 26 -method 7 -threads 2 19,497,869 4625 4711 1845 CM 26 -method 8 -threads 1 19,038,853 164,475,887 95,914 s 164,571,801 6153 6296 1911 CM 26 -method 8 -threads 2 19,038,853 3553 3551 3800 CM 48 -method 9 -threads 1 19,004,217 161,001,056 95,914 s 161,096,970 3468 3521 3800 CM 48
zpaq v6.34 has 7 compression methods as follows:
Methods 0 and 1 use 16 MB blocks by default. Methods 2..6 use 64 MB blocks. The size can be specified by a second digit N which specifies 2N MB blocks. Thus, the defaults are 04, 14, 26, 36, 46, 56, 66. Larger blocks compress better but require more memory per thread.
Methods 1..6 use heuristics to detect already compressed data and either store it or compress it with a fast method like 1 depending on the degree of compressibility. The heuristic depends on the 256 byte order-1 prediction table that is used to compute the rolling hash used in the fragmentation algorithm. The table is initialized to all zeros at each fragment boundary, and contains the last byte seen in each of 256 possible 1 byte contexts. If the data is random, then at each fragment boundary (average size 64K), the following properties are expected:
In addition, the order 1 tables are used to detect text and x86 (.exe) data types. Text is detected if at least 5 letter, digit, period, or comma contexts predict a space, minus any predicted characters in the range 1..8, 11, 12, 14..31, which normally do not appear in text files. If at least 1/4 of the fragments are detected as text, then methods 5 and 6 add extra models for it. x86 is detected if at least 5 contexts predict a 139 (an x86 MOV reg, r/m instruction). If at least 1/8 of the fragments are detected as x86, then a E8E9 pre/post processor is used in methods 1..6.
LZ77 and BWT removed the 16 MB block size limitation of the previous version. Variable length LZ77 adds an extra field of rb = 1..8 bits to represent the low bits of an offset up to 32 bits, where rb increases by 1 for each doubling of the block size over 16 MB. 2rb - 1 is added to the offset, so that it requires a rb..rb+23 bit code.
Byte aligned LZ77 removed the limitation by eliminating the short code (3 bit length and 11 bit offset) and adding a code with 4 offset bytes. Lengths range from m..m+63 where m is the mininum match length, normally 8 when used with an order-1 context model.
BWT removes the block size limitation by removing the IBWT optimization of packing pointers and the byte pointed to into a single 32 bit linked list element when the block size is over 16 MB. No changes were required for higher compression levels.
zpaq versions since v6.22 support custom context models through the command line. When compressing enwik8 and enwik9 the following models are automatically generated:
Option Equivalent ------ ---------- -m 0 -m x4,0 -m 1 -m x4,1,4,0,3,24,16,18 -m 18 -m x8,1,4,0,3,27,16,18 -m 2 -m x6,1,4,8,4,26,16,18 -m 28 -m x8,1,4,8,4,27,16,18 -m 3 -m x6,2,8,0,4,26,16,24c0,0,511 -m 38 -m x8,2,8,0,4,26,16,24c0,0,511 -m 4 -m x6,3ci1 -m 48 -m x8,3ci1 -m 5 -m x6,0ci1,1,1,1,2awm -m 58 -m x8,0ci1,1,1,1,2awm -m 6 -m x6,0w2c0,1010,255i1c256ci1,1,1,1,1,1,2ac0,2,0,255i1c0,3,0,0,255i1c0,4,0,0,0,255i1mm16ts19t0 -m 68 -m x8,0w2c0,1010,255i1c256ci1,1,1,1,1,1,2ac0,2,0,255i1c0,3,0,0,255i1c0,4,0,0,0,255i1mm16ts19t0
The meaning is as follows.
x (experimental) rather than a digit selects a specific method which is the same for every block. It can also be s to add in streaming mode with each file in a separate block and large files split into blocks with no deduplication.
The first digit N1 after x selects a maximum block size of 2N1+20 - 4096 bytes. This is selected by the second digit of the method, if present, or else it defaults to 6 for methods 2..6 or 4 otherwise.
The second digit N2 selects the pre/post processing step. 0 means none. 1 means LZ77 with variable length codes. 2 means LZ77 with byte aligned codes. 3 means BWT. 4..7 means 0..3 with E8E9 filtering.
N3..N8 apply to the LZ77 modes only. N3 (4 or 8) is the minimum match length. N4 (8 or 0) if not 0 specifies a context order to search first. N5 (3 or 4) says to search 2N5 contexts of each order to look for matches. N6 (24..27) specifies 2N6 elements in the hash table for lookups. Each entry requires 4 bytes of memory. It defaults to the block size up to N1=26, then N1-1. N7 and N8 specify that the minimum match (N3) should be increased by 1 after a literal or match, respectively, when the match offset is greater than 2N7 or 2N8 respectively.
The sequence of strings starting with letters followed by a comma-separated list of numbers specifies various context models used by methods 3 and higher. c0 specifies an ICM (indirect context model: context to bit history to prediction). c1...c256 (used in -m 6) specifies a CM (context to prediction) with an update rate of 1/count and maximum count of N1*4-4, e.g. c256 specifies 1020. The remaining arguments to c default to 0. N2 describes any special contexts. N2 in 1..255 (e.g. c0,2) means offset mod N2. N2 in 1000..1255 means the distance to the last occurrence of N2-1000 (e.g. c0,1010 means how far from the last linefeed). N3 and up specifies byte masks starting with the most recent context byte (e.g. c0,2,0,255 means offset mod 2 combined with the second context byte (sparse model)). A value of 256..511 includes the byte aligned LZ77 parse state if applicable (e.g. c0,0,511 means the order 1 context plus parse state hashed together).
i followed by a list specifies a chain of ISSE components with each context order increasing by the specified amount by hashing it with the previous component, (e.g. ci1,1,1,1,2 specifies an order 0 ICM chained with order 1, 2, 3, 4, 6 ISSE). Each ISSE (indirect secondary symbol estimator) adjusts the prediction of the previous component in the bit history of the current context (hashed together with the previous component's context).
a specifies a match model, which predicts the bit which followed the most recent occurrence of the current (normally high order) context. It can take parameters specifying buffer size, hash table index size and context order.
wN1 specifies a word model, an ICM-ISSE chain of increasing order from 0 to N1-1 in words rather than bytes. A word is defined as a sequence of letters converted to upper case, ignoring all other characters (e.g. w2 specifies an order 0 ICM and order 1 ISSE). It can take additional parameters specifying an alphabet range and a mask to convert case.
m specifies a mixer, which adaptively averages the predictions of all prior components. It can take a parameter (default 8) which is the number of bits of context to select the mixing weights (e.g. m16 is a byte-wise order 1 context). It takes additional parameters specifying update rate.
t is a MIX2 2-input mixer which averages just the last 2 components.
s is a SSE which adjusts the prevous prediction like an ISSE but using a direct context instead of a bit history. It takes parameters specifying the number of context bits (e.g. s19 selects the current and previous bytes and the 3 high bits of the second byte), and additional parameters specifying initial and final update rates.
-m is short for -method. -th 1 (-threads 1) selects 1 thread. The default on the test machine is 4 (2 cores + 2 hyperthreads). It is also used in decompression to reduce memory.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------------ ---------- ----------- ----------- ----------- ----- ----- --- --- ---- zpaq 6.34 -m 1 36,720,879 322,717,507 38 15 456 LZ77 48 -m 18 -th 1 36,174,283 316,439,766 85 25 1200 LZ77 48 -m 2 32,785,291 287,047,166 76 17 1500 LZ77 48 -m 28 -th 1 32,123,217 279,231,899 159 25 1200 LZ77 48 -m 3 30,759,444 270,317,562 89 56 1500 LZ77 48 -m 38 -th 1 30,216,795 264,333,006 198 106 1200 LZ77 48 -m 4 21,982,505 189,860,169 285 224 1800 BWT 48 -m 48 -th 1 21,293,686 179,016,475 596 512 1400 BWT 48 -m 5 20,742,462 179,365,293 937 658 2100 CM 48 -m 58 -th 1 20,214,879 172,645,399 1931 1430 2400 CM 48 -m 6 19,627,225 168,583,236 2348 2356 3300 CM 48 -m 68 -th 1 18,998,601 160,541,121 118,086 s 160,659,207 4300 4408 3200 CM 48
The following table shows compression with the config file max5.cfg (Oct. 14, 2013). This is the same model as max_enwik9.cfg except that it was modified to take an argument to double memory usage for most of the components for each increment. With argument 0, it is the same as max_enwik9. Compression was with zpaqd 6.33 (June 20, 2013), which is the developement tool that accompanies zpaq and produces streaming mode archives from a config file. Thus, the command "zpaqd c max5 3 archive enwik9" compresses to archive.zpaq with 3 passed to $1 in max5.cfg. This has the effect of using almost 8 times as much memory for both compression and decompression as max_enwik9. The archive was decompressed with both zpaq 6.42 (Sept. 26, 2013) and with tiny_unzpaq (Mar. 21, 2012, public domain) compiled with g++ 4.1.2 -O3 under Linux on the test machine, which has 20 GB of available memory. zpaq 6.42 is an archiver like zpaq 6.33 with a number of added features and bug fixes unrelated to compression. tiny_unzpaq is a stand-alone program that extracts only streaming mode archives and is designed so that the source code is as small as possible. It does not support JIT compilation of the ZPAQL code, or multithreading and has no error checking or help message. It takes an archive as an argument with no options and extracts to the saved names.
max6.cfg (Oct. 15, 2013) modifies max5 by rewriting the word model and adding models that count brackets ("[" minus "]" in range 0..2) and a column model (counts bytes after the last linefeed in range 0..64). It also changes the memory parameter from $1 to $3 so it can be passed to zpaq like "-m s10.0.5fmax6". This means to choose streaming mode (s), a block size of 2^10 MB (10), no preprocessing (0), pass 5 as $3 selecting 14 GB (or 1 selecting 1.4 GB) using max6.cfg. For this test, tiny_unzpaq is used to extract when the decompresser is given as "sd" although either program could be used.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------------ ---------- ----------- ----------- ----------- ----- ----- --- --- ---- zpaqd 6.33 max5 0 18,238,448 5960 2000 CM 61 max5 1 18,135,013 146,750,019 6309 3400 CM 61 max5 2 18,095,676 144,918,290 6521 6600 CM 61 max5 3 18,084,027 143,757,714 4,760 sd 143,762,474 5894 13173 13100 CM 61 zpaq 6.42 143,757,714 125,670 s 143,883,384 5985 13500 CM 61 zpaq 6.42 -m s10.0.1fmax6 18,167,158 150,622,666 125,670 s 150,748,336 6368 6475 1400 CM 61 -m s10.0.5fmax6 17,855,729 142,252,605 4,760 sd 142,257,365 6699 14739 14000 CM 61
zpaq 6.50, Mar. 21, 2014, uses 5 compression levels instead of 6. LZ77 when used in methods 2 and higher uses a suffix array to find matches. There are also other improvements in sorting files, grouping into blocks, detecting file type, detecting random data, and selecting compression algorithm based on type. Tests below used 4 threads.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------------ ---------- ----------- ----------- ----------- ----- ----- --- --- ---- zpaq 6.50 -method 1 35,691,734 314,117,968 137,993 s 314,255,964 35 23 512 LZ77 48 -method 2 31,184,422 271,626,606 137,993 s 271,764,602 150 24 1800 LZ77 48 -method 3 21,980,366 189,875,990 137,993 s 190,013,986 222 220 1600 BWT 48 -method 4 20,740,505 179,455,249 137,993 s 179,593,245 665 670 2200 CM 48 -method 5 19,625,015 168,590,741 137,993 s 168,728,730 2410 2419 3400 CM 48
lpaq versions 1 through 8 may be downloaded here. lpaq9* can be downloaded here or as a zpaq archive. The decompr8 series of Hutter prize entries (decompresser and enwik8 archive) are also listed here because they followed a period of development of the lpaq series.
Note: some of these programs are compressed with upack, which compresses better than upx. Some virus detectors give false alarms on all upack-compressed executables. The programs are not infected.
lpaq1 is a free, open source (GPL) file compressor by Matt Mahoney, July 24, 2007. It uses context mixing. It is a "lite" version of paq8l, about 35 times faster at the cost of about 10% in compression. The "9" option selects maximum memory. The options range from 0 (6 MB) to 9 (1.5 GB). Memory usage is 3 + 3*2N MB, N = 0..9.
The compressor mixes 7 contexts: orders 1, 2, 3, 4, 6, a unigram word context (consecutive letters, case insensitive), and a matched bit context. The contexts (except the matched bit) are mapped to nonstationary bit histories using nibble-aligned hash tables, then mapped to bit prediction probabilities using stationary adaptive tables with bit counts to control adaptation rate. The matched bit context maps the predicted bit (based on a context match), match length and order-1 context (or order 0 if no match) to a bit prediction. The probabilities are combined in the logistic domain (log(p/(1-p)) using a single layer neural network selected by a small context (3 high bits of last byte + context order), then passed through 2 SSE stages (orders 0 and 1) and arithmetic coded. Except for one model for ASCII text, there are no specialized models for binary data, .exe, .bmp, .jpeg, etc.
lpaq2 by Alexander Rhatushnyak, Sept. 20, 2007, contains some speed optimizations.
lprepaq 1.2 by Christian Schnaader, Sept. 29, 2007, is lpaq1 combined with precomp as a preprocessor. precomp compresses JPEG files and also expands data segments compressed with zlib, often making them more compressible. This preprocessing has no effect on text files.
lpaq3 and elpaq3 by Alexander Rhatushnyak, Sept. 29, 2007, has two versions with the same source code. When compiled with -DWIKI, the result is elpaq3 which is tuned for large text files. The normal compile produces lpaq3.
lpaq3a by Alexander Rhatushnyak, Sept. 30, 2007, improves compression on some files over lpaq3 (but not enwik8/9). The archive also contains lpaq3e.exe, which is an archive compatible (Intel compile) of elpaq3.exe.
lpaq4 and lpaq4e (mirror) are by Alexander Rhatushnyak, Oct. 1, 2007. lpaq4e is tuned for large text files.
lpaq5 and lpaq5e are by Alexander Rhatushnyak, Oct. 16, 2007. Option 9 selects 1542 MB memory. lpaq5e is tuned for large text files. It includes separate programs for compression only (lpaq5e-c.exe) and decompression only (lpaq5e-d.exe). Tests were done with these programs, rather than the version that does both (lpaq5e.exe).
lpaq6 and lpaq6e are by Alexander Rhatushnyak, Oct. 22, 2007. Option 9 selects 1542 MB memory. lpaq6e is tuned for large text files. lpaq6 includes a E8E9 transform for compressing x86 executables.
lpaq7 and lpaq7e (mirror) are by Alexander Rhatushnyak, Oct. 31, 2007.
lpaq8 and lpaq8e are by Alexander Rhatushnyak, Dec. 10, 2007. The executables are packed with upack. zip -9 would make them larger.
lpaq1a by Matt Mahoney, Dec. 21, 2007, uses the same model as lpaq1 but replaces the arithmetic coder with the asymmetric binary coder from fpaqb.
lpq1 by Matt Mahoney, Dec. 23, 2007, is an archiver (not a file compressor) based on lpaq1 option 7.
drt|lpaq9e is by Alexander Rhatushnyak, Feb. 20, 2008. It is specialized for English text. It includes a separate program drt.exe (without source code) which performs a dictionary transform prior to compression with lpaq9e. The option 9 is for lpaq9e which selects maximum memory. The program size is computed by adding lpaq9e.exe, drt.exe, and the compressed dictionary, which must be uncompressed with lpaq9e before running. The size is smaller without a zip archive. Decompression consists of uncompressing the dictionary with lpaq9e, uncompressing the transformed file with lpaq9e, and reversing the transform with drt. Run times are for the sum of all three operations (1+62+2943, 1+2929+45 sec).
lpaq9f by Alexander Rasushnyak, Apr. 27, 2007, works like lpaq9e. Run times are (2+55+2801, 2+2819+38 sec). drt uses 8 MB for compression and 4 MB for decompression.
lpaq9g by Alexander Rasushnyak, May 23, 2008, works like lpaq9e. Run times are (2+51+2691, 2+2682+38 sec).
lpaq9h by Alexander Rasushnyak, June 3, 2008, works like lpaq9e. Run times are (2+53+2530, 2+2529+44 sec).
lpaq9i by Alexander Rasushnyak, June 13, 2008, works like lpaq9e. Run times are (2+59+2425, 2+2453+46 sec). drt.exe and the dictionary file (tmpdict0.dic) are unchanged in all versions starting with lpaq9f.
lpaq9j by Alexander Rhatushnyak, Aug. 17, 2008, has a new version of drt.exe and dictionary. Run times are (2+58+2365, 2+2358+48 sec).
lpaq9k is by Alexander Rhatushnyak, Sept. 30, 2008. Run times are (2+59+2336, 2+2346+47 sec). decompresser size is as 3 files (not zipped).
lpaq9l is by Alexander Rhatushnyak, Dec. 2, 2008. Run times are (2+41+2132, 2+2179+40 sec) on the computer described in note 26, and (2+58+2338, 2+2422+50) on the computer used to test all the earlier versions. decompresser size is as 3 files (not zipped).
lpaq9m (zpaq archive) is by Alexander Rhatushnyak, Feb. 20, 2009. Run times are (2+38+2067, 2+2111+38). decompresser size is 3 files (not zipped).
decomp8 is a Hutter Prize entry by Alexander Rhatushnyak, Mar. 23, 2009. It consists of a decompresser (Windows executable only) and an archive (archive8.bin) which decompresses to enwik8. There is no compressor. During decompression, the program creates a temporary file containing a dictionary similar to the one used in paq8hp12 and by drt. The command to decompress is "decomp8 archive8.bin enwik8". The total size (not zipped) is 15,986,677 bytes.
decomp8b is an update to the Hutter prize entry decomp8 by Alexander Rhatushnyak, Apr. 22, 2009. Total size (not zipped) is 15,958,674 bytes.
decmprs8 is an update to the Hutter prize entry decomp8b by Alexander Ratushyak, May 23, 2009. Total size (not zipped) is 15,949,688 bytes. To decompress: decmprs8.exe archive8.dat enwik8
Prog Opt enwik8 enwik9 prog Total Comp Deco Mem Alg Note ---- --- ---------- ----------- ---- ----------- ---- ---- ---- --- ---- lpaq1 9 19,755,948 164,508,919 6,676 x 164,515,595 3646 3594 1539 CM lpaq2 9 19,755,471 164,496,295 6,888 x 164,503,183 3260 3354 1539 CM lprepaq 1.2 9 19,755,989 164,509,300 189,891 x 164,699,191 8696 7888 1582 CM lpaq3 9 19,580,276 165,600,121 7,514 x 165,607,635 3695 3735 1542 CM elpaq3 9 19,392,604 160,081,507 7,377 x 160,088,884 3411 3454 1542 CM lpaq3a 9 19,585,951 165,661,890 12,004 s 165,673,894 4177 4163 1542 CM lpaq3e 9 19,392,604 160,081,507 12,004 s 160,093,511 3967 3932 1542 CM lpaq4 9 19,583,905 165,603,612 7,117 x 165,610,729 3693 3697 1542 CM lpaq4e 9 19,358,662 159,675,213 6,990 x 159,682,203 3383 3422 1542 CM lpaq5 9 19,455,395 161,410,276 8,382 x 161,418,658 3614 3630 1542 CM lpaq5e 9 19,078,767 156,194,860 7,841 xd 156,202,701 3428 3605 1542 CM lpaq6 9 19,562,861 165,224,012 8,848 x 165,232,860 3586 3624 1542 CM lpaq6e 9 19,054,076 155,943,020 8,866 x 155,951,886 3420 3478 1542 CM lpaq7 9 19,557,894 162,359,435 9,078 x 163,368,513 3922 3850 1542 CM lpaq7e 9 19,039,516 155,840,757 8,570 x 155,849,327 3477 3490 1542 CM lpaq8 9 19,523,803 161,987,713 9,676 x 161,997,389 3682 3718 1542 CM lpaq8e 9 18,982,007 155,232,477 8,888 x 155,241,365 3424 3475 1542 CM lpaq1a 9 19,759,778 164,547,926 8,558 x 164,556,484 3462 3423 1540 CM lpq1 19,888,399 168,467,267 9,151 x 168,476,408 3389 3402 387 CM drt|lpaq9e 9 18,151,024 145,628,635 110,844 x 145,739,479 3006 2975 1542 CM drt|lpaq9f 9 18,079,247 144,877,844 110,864 x 144,988,708 2858 2859 1542 CM drt|lpaq9g 9 18,069,107 144,838,636 110,318 x 144,948,954 2744 2722 1542 CM drt|lpaq9h 9 18,067,711 144,763,248 110,376 x 144,873,624 2585 2575 1542 CM drt|lpaq9i 9 18,065,347 144,752,858 110,149 x 144,863,007 2486 2501 1542 CM drt|lpaq9j 9 18,056,997 144,687,646 110,135 x 144,797,781 2425 2408 1542 CM drt|lpaq9k 9 18,007,677 144,277,379 110,785 x 144,388,164 2397 2395 1542 CM drt|lpaq9l 9 17,979,724 144,082,479 110,479 x 144,192,958 2398 2474 1542 CM drt|lpaq9l 9 17,979,724 144,082,479 110,479 x 144,192,958 2175 2221 1542 CM 26 drt|lpaq9m 9 17,964,751 143,943,759 110,579 x 144,054,338 2107 2151 1542 CM 26 drt|lpaq9m 9 17,964,751 143,943,759 110,579 x 144,054,338 868 896 1542 CM 41 decomp8 15,970,425 16,252 xd 78180 936 CM 26 decomp8b 15,942,290 16,384 xd 74790 934 CM 26 decmprs8 15,932,968 16,720 xd 76080 936 CM 26
drt may be combined with other compressors to improve compression. The following were obtained using drt and tmpdict0.dic (from lpaq9i) with ppmonstr J (PPM). Option -m1650 selects 1650 MB memory. -r1 partially rebuilds the model when memory is exhausted. -o select the PPM model order. Compression time is for ppmonstr only. Mem8 is actual memory used to compress enwik8.drt. enwik9.drt always uses 1650 MB. As a separate compressor, the compressor size would be 147,915 for a zip file containing drt.exe, ppmonstr.exe, and tmpdict0.pmm (tmpdict0.dic compressed with ppmonstr -m1650 -r1 -o64). Total size would be 148,047,289.
For drt 9j, the decompresser size is 149,468 and total size is 147,196,757.
Compressors options enwik8 enwik9 Comp Mem8 ------------------- ---------------- ---------- ----------- ---- ---- drt 9i | ppmonstr J -m1650 -r1 -o10 18,185,633 147,936,682 2509 825 -m1650 -r1 -o11 18,166,961 147,899,374 2634 895 -m1650 -r1 -o12 18,152,982 147,907,628 2661 953 -m1650 -r1 -o16 18,142,625 148,306,179 2888 1109 -m1650 -r1 -o32 18,124,722 149,857,650 3361 1371 -m1650 -r1 -o64 18,122,785 151,343,426 3870 1554 -m1650 -r1 -o128 18,130,333 1650 drt 9j | ppmonstr J -m1650 -r1 -o11 18,165,440 147,859,151 2636 -m1650 -r1 -o64 18,120,770 2603
The following shows the effects of drt from lpaq9m on enwik8. The first numeric column is the compressed size of enwik8. The second is the compressed size of the uncompressed dictionary (lpqdict0.dic, 465,210 bytes) concatentated with enwik8.drt (61,289,634 bytes) using compressor versions that were current as of June 26, 2010 unless indicated. The ratio shows the improvement due to preprocessing. The dictionary contains 44880 lowercase words. DRT replaces word occurrences with codes of 1 to 3 bytes and uses codes to indicate capitalized words or letters.
Compressor enwik8 dic+drt ratio Options (version) ---------- ------- -------- ------ ----------------- paq8px_v67 18293940 17342041 0.9480 -6 paq8l 18518485 17560378 0.9483 -6 nanozip 18826931 18633832 0.9897 -cc (v0.08a) lpaq9m 19072743 18077356 0.9478 8 zpaq 19448650 18928856 0.9733 ocmax.cfg pmm 19701161 18650601 0.9467 (J) lpaq1 19796957 18905483 0.9550 paq9a 20129573 19374291 0.9625 paq6 20303336 19439547 0.9575 -6 cmm4 20548514 19133313 0.9311 (v0.1e) zpaq 20941558 19447733 0.9287 ocmid.cfg nz 20948832 20588807 0.9828 (v0.08a) bwt.fpaq0f2 21798843 21406906 0.9820 paq1 22156982 21437426 0.9675 bwt.fpaq0p 23809591 22855730 0.9599 grzip 23846878 22379326 0.9385 (0.2.4) bbb 24576921 22701384 0.9237 zpaq 24837469 21559014 0.8680 ocfast.cfg tarsalzp 25134862 22773386 0.9060 lzpxj 25251404 21877402 0.8664 8 (1.2h) p6 25377998 23078246 0.9094 ctw 25453025 24454785 0.9608 7z 25895909 23487746 0.9070 (9.12b) szip 26120472 24045552 0.9206 -b41 -o16 ppmd 26275353 23448205 0.8924 (J) ppms 26310248 23824677 0.9055 (J) dmc 28402672 25532850 0.8990 100000000 cabarc 28465607 25963613 0.9121 -m lzx:21 bzip2 29008758 25612712 0.8829 -9 sr2 30432506 26328768 0.8652 RAR 35107917 30132497 0.8583 -m5 (v2.50) HA 36379137 30633820 0.8421 (0.98) gzip 36445248 30902821 0.8479 -9 (1.3.5) zip 36445470 30903043 0.8479 -9 (2.32) lzop 41217688 33358696 0.8093 -9 (1.01) srank 43091439 38492535 0.8933 -C8 fcm1 45402225 29581661 0.6515 compress 45763941 37478724 0.8190 lzrw3-a 48009194 38635335 0.8047 bpe 53906667 41403271 0.7681 5000 4096 200 3 fastlz 54658924 42337322 0.7746 lzrw2 55360907 41854974 0.7560 fpaq0f2 56916872 40415334 0.7101 flzp 57366279 43944882 0.7660 lzrw5 59375192 46019812 0.7751 lzrw1-a 59471657 43184084 0.7261 fpaq0p 61457810 44979267 0.7319 ppp 61657971 44103741 0.7153 fpaq0 63391013 47589951 0.7507 100000000 61289634 0.6129 (uncompressed) bwt 100000004 61289638 0.6129 (msufsort 3.1b)
mcm v0.0 is a free, experimental, closed source file compressor by Mathieu Chartier, June 4, 2013. It uses CM. Options -1 ... -9 select 8 MB to about 1500 MB memory.
mcm v0.2, June 11, 2013, has automatic detection of text and binary files with UTF modeling in text mode and sparse models in binary mode, an improved match model, and cache optimizations.
mcm v0.3 was released June 17, 2013.
mcm 0.4 was released as open source on July 17, 2013. To test, it was compiled with g++ 4.8.0 using the supplied make.bat file.
mcm 0.8 (discussion), was released Feb. 5, 2015. It uses LZP preprocessing with fast and high modes. The high mode (default, as tested) uses 8 context models and the fast uses 6. It was compiled in Linux/g++ 4.8.2 using the supplied make.bat file. Option -10 uses 2.9 GB memory. Option -11 (5.5 GB) was not tested.
mcm 0.82 was released Feb. 16, 2015. -max selects best compression (default is -high).
mcm 0.83 was released Apr. 5, 2015. -x10 and -x11 select the memory used for max compression. To test -x10, I compiled from source using the supplied make.sh in Ubuntu, g++ 4.8.2. -x11 was tested using optimized source with comments removed.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- mcm v0.0 -9 19,842,740 166,276,589 116,198 x 166,392,787 1425 1449 1447 CM 26 mcm v0.2 -9 19,768,502 165,480,329 137,308 x 165,617,637 1453 1468 1451 CM 26 mcm v0.3 -9 19,707,487 164,464,527 122,205 x 164,586,732 1387 1435 1452 CM 26 mcm v0.4 19,858,418 1718 1623 735 CM 26 -9 19,762,418 165,009,983 43,479 s 165,053,462 1552 1494 1457 CM 26 mcm v0.8 -10 19,434,824 158,421,558 107,815 s 158,529,373 710 727 2901 CM 48 mcm v0.82 -10 19,281,673 157,501,519 111,681 s 157,613,200 636 642 2854 CM 48 -10 -max 19,173,407 156,544,880 111,681 s 156,656,561 692 689 2855 CM 48 mcm v0.83 -x10 18,286,400 146,525,446 127,052 s 146,652,498 706 569 3056 CM 48 -x11 18,233,295 144,854,575 79,574 s 144,934,149 394 281 5961 CM 72
nanozip 0.01a is a free, experimental, closed source GUI and command line archiver by Sami Runsas, July 14, 2008. For these tests, the command line version (smaller executable) was used. It compresses using several algorithms (fastest to best): LZP (options -cf and -cF), LZ77 (-cd, -cD), BWT (-co, -cO, uses 5N block size) and CM (-cc). The uppercase options (-cF, -cD, -cO) compress better but slower than the corresponding lowercase options and may use more memory. The default compression mode is -co (fast BWT). -m1500m selects 1500 MB memory, although the reported memory usage may differ and the actual memory usage (Cmem, Dmem, in MB) measured with Task Manager is usually lower than reported. The program will use less memory depending on available physical memory when run. -forcemem was used to override this. For all tests, -nm was used to turn off checksums and not store timestamps or file permissions. For -cO, the program uses a LZ77 variant (called LZT) instead of BWT for binary files. -txt is an optimization for text files with -co or -cO.
nanozip 0.03a was released July 31, 2008. Only -cc was tested.
nanozip 0.05a was released Oct. 20, 2008. Options are as in 0.01a and include -nm -forcemem.
nanozip 0.06a was released Feb. 13, 2009. Options are as in 0.01a and include -nm -forcemem. w32c creates a self extracting archive (.exe file).
nanozip 0.08a was released June 3, 2010. _64 refers to the Windows 64 bit version. w32c means to produce a self extracting archive. -nm means do not store metadata or redundancy information. -cc selects a context mixing model. -m2.6g means use 2.6 GB memory. enwik8 was tested with -m2g (uses 1670 MB).
nanozip 0.09a was released Nov. 4, 2011. Option w32c selects a self extracting archive, so the decompresser size is 0. Option -p4 runs multithreaded compression on 4 processors. Tested under 64 bit Linux.
Program Options enwik8 enwik9 zip size Total Comp Deco Cmem Dmem (reported) Alg Note -------- ----------- ---------- ----------- --------- ----------- ---- ---- ---- ---- ---- ---- --- ---- nz 0.01a -cf 46,381,713 24 24 96 404 404 LZP -cf -m1500m 46,381,713 417,351,980 266,797 x 417,618,777 26 31 975 978 1476 1476 LZP -cF 40,733,125 62 43 155 404 404 LZP -cF -m1500m 40,733,125 359,192,720 359,459,517 63 40 1040 1045 1476 1476 LZP -cd 33,241,150 127 28 89 422 402 LZ77 -cd -m1500m 33,001,952 292,180,617 292,447,414 156 28 768 687 1546 1474 LZ77 -cD 29,384,997 288 27 282 466 258 LZ77 -cD -m1500m 29,253,158 258,513,190 258,779,987 323 31 1020 693 1314 994 LZ77 -co 21,838,721 391 186 333 431 336 BWT -co -m1500m 20,503,629 176,470,974 176,737,771 448 221 1667 1160 1810 1294 BWT -co -m1500m -txt 20,503,629 170,711,387 170,978,184 336 234 1074 1120 1471 1463 BWT -cO 21,623,801 465 247 333 431 266 BWT -cO -m1500m 20,306,489 174,770,662 175,037,459 511 269 1378 1135 1810 1294 BWT -cO -m1500m -txt 20,306,489 169,092,652 169,359,449 393 280 1074 1274 1471 1463 BWT -cO -m1670m -txt 20,306,489 167,509,921 167,776,718 403 284 1170 1325 1633 1625 BWT -cc 18,994,349 2975 2910 360 436 435 CM -cc -m1500m 18,723,413 152,654,332 152,921,129 3147 3091 1556 1556 1524 1523 CM nz 0.03a -cc -m1670m 18,679,094 151,668,563 263,953 x 151,932,516 3058 3003 1700 1700 1700 1699 CM nz 0.05a -cf -m1670m 46,381,713 18 22 100 LZP -cF -m1670m 40,608,638 66 41 164 LZP -cd -m1670m 31,555,257 96 29 289 LZ77 -cD -m1670m 27,811,031 182 35 170 LZ77 -co -m1670m 20,499,411 351 177 626 BWT -cO -m1670m 20,302,501 422 240 642 BWT -cc -m1670m 18,638,419 151,176,555 288,449 x 151,465,004 3032 2975 1668 CM nz 0.06a -co -m1670m 20,499,412 250 183 441 BWT 26 -cO -m1670m 20,302,502 300 243 457 BWT 26 -cc -m1670m 18,636,515 151,177,510 336,273 x 151,513,783 2143 2137 1670 CM 26 w32c -cc -m1670m 18,754,787 151,295,782 0 xd 151,295,782 2156 2173 1670 CM 26 nz 0.08a_64 w32c -nm -cc -m2.6g 18,752,842 150,441,103 0 xd 150,441,103 1109 1086 2760 CM 40 -cc -m2g 18,623,317 150,375,385 459,607 x 150,834,992 1616 2088 CM 42 nz 0.09a w32c -cc -m3g -nm 18,723,846 150,037,341 0 xd 150,037,341 1110 1084 2693 CM 40 w32c -cc -m3g -nm -p4 158,107,738 0 xd 158,107,738 299 3124 CM 40 -cc -m32g -p1 -t1 -nm 18,594,163 148,545,179 783,642 x 149,328,821 1149 1141 32000 13285 13282 CM 74
cmv 00.01.00 is a free, closed source, experimental file compressor for 32 bit Windows by Mauro Vezzosi, Sept. 6, 2015. It uses context mixing. Option "2,3,+" selects max compression (2), max memory (3), and a large set of models (+). A hex bitmap for this argument turns individual models on or off. Note 48 timings are for enwik8 only.
cmv 00.01.01 was released Jan. 10, 2016. It is compatible with 00.01.00 and does not change the compression ratio.
cmve 0.2.0 was released Nov. 28, 2017.
Program Options enwik8 enwik9 zip size Total Comp Deco Cmem Dmem Alg Note -------- ----------- ---------- ----------- --------- ----------- ---- ---- ---- ---- ---- ---- cmv 00.01.00 -m2,3,+ 18,218,283 150,226,739 77,404 x 150,304,143 285750 293090 2817 2817 CM 48,75 150,226,739 77,404 x 150,304,143 216000 2801 CM 75 -m2,3,0x03ededff 18,153,319 720000 ~3900 CM 75 cmv 00.01.01 -m2,3,0x03ed7dfb 18,122,372 149,357,765 77,404 x 149,435,169 426162 394855 3335 3335 CM 75 cmve 0.2.0 -m2,3,0x7fed7dfd 16,424,248 129,876,858 307,787 x 130,301,106 1140801 19963 CM 81
xml-wrt 2.0 is a free command line file compressor with source available, by Przemyslaw Skibinski, June 19, 2006. It uses LZMA (LZ77 + arithmetic coding) with preprocessing for modeing text, XML tags, dates, and numbers. It may also be used as a preprocessor for input to other compressors. Version 1.0 was strictly a preprocessor without built-in compression.
The -l6 option selects maximum LZMA compression. -b255 selects maximum buffer size of 255 MB for building a dynamic dictionary. -m255 selects maximum memory. -s turns off spaces modeling. -f8 sets the minimum word frequency for dictionary inclusion to 8 (default is 6).
xml-wrt 3.0 (Sept. 14, 2006) includes a stripped-down version of PAQ8 (-l11 option) in addition to LZMA compression.
xwrt 3.2 (Oct. 29, 2007) is a dictionary preprocessor frontend to LZMA, PPMVC and lpaq6 as well as a standalone preprocessor. Option -l14 selects lpaq6 option 9 (1542 MB). -b255 selects 255 MB memory (maximum) for building the dictionary. -m96 selects 96 MB buffer during compression. (Higher values cause out of memory error). -s turns of space modeling. -e40000 limits the dictionary size to 40000 words. -f200 limits the dictionary to words that occur at least 200 times.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- xml-wrt 2.0 -l6 -b255 -m255 -s -f8 23,199,202 196,914,328 25,354 s 196,939,682 905 70 525 LZ77 xml-wrt 3.0 -l11 -b255 -m255 -f24 19,663,305 165,274,422 40,447 s 165,314,869 4398 4317 416 CM xwrt 3.2 -l14 -b255 -m96 -s -e40000 -f200 18,679,742 151,171,364 52,569 s 151,223,933 2537 2328 1691 CM
xml-wrt 2.0 and higher and xwrt 3.2 can be used as either a standalone compressor or as a preprocessor to other compressors. The table below shows the best known settings for enwik9 and enwik8 for xml-wrt 3.0 and 2.0 as a preprocessor to ppmonstr var. J, the best known combination for which xml-wrt improves compression. xml-wrt 1.0 is a preprocessor only. See also xml-wrt and xwrt as a standalone compressor.
Compressed size Decompresser Total size Time (ns/byte) Program/options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg ------------------------------------------------------------------- ---------- ----------- ----------- ----------- ----- ----- --- --- xml-wrt 3.0 -l0 -b255 -m255 -3 -s -e20000 | ppmonstr J -m1650 -o10 18,592,499 150,004,636 82,466 sx 150,087,102 3067 2708 1650 PPM xml-wrt 3.0 -l0 -b255 -m255 -3 -s -e7000 | ppmonstr J -m1650 -o64 18,494,374 82,466 sx 3500 3340 1650 PPM xml-wrt 2.0 -l0 -w -s -c -b255 -m100 -e10000 | ppmonstr J -m1700 -o10 18,794,295 150,651,873 67,309 sx 150,719,182 2715 ~2650 1700 PPM xml-wrt 2.0 -l0 -w -s -c -b255 -m100 -e2300 | ppmonstr J -m1650 -o64 18,625,624 67,309 sx 3550 3360 1650 PPM xml-wrt 2.0 -l0 -w -s -c -b255 -m100 -e10000 | ppmonstr J -m800 -o8 18,863,790 154,223,582 67,309 sx 154,290,891 2820 800 PPM xml-wrt 1.0 -f800 | ppmonstr J -m800 -o8 19,043,178 154,749,585 56,837 sx 154,806,422 2702 ~2700 800 PPM
xml-wrt 1.0 (XML Word Reducing Transform) is a free command line single file preprocessor with source code by Przemyslaw Skibinski, May 10, 2006. It is not intended to compress files by itself (although it does somewhat). Rather, it is intended to improve the compressibility of text and XML files by replacing common words and XML substrings with shorter symbols. (So it is actually LZW with a static dictionary prepended to the output). It improves compression for most programs except for those that already have English text models such as paq8h. Some additional results are shown below for combinations with some other compressors.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Notes ------- ------- ---------- ----------- ----------- ----------- ----- ----- ----- xml-wrt 1.0|ppmonstr J -f1800 | -m800 -o10 18,965,658 155,066,074 56,837 sx 155,122,911 2905 2809 xml-wrt 1.0|slim23d -f1800 | -m700 -o12 19,163,987 156,734,571 69,453 x 156,804,024 4702 4717 xml-wrt 1.0|ppmd J1 -f1800 | -m256 -o8 -r1 21,128,019 178,154,529 25,917 s 178,180,446 717 722
The following table shows the compressed size (without decompresser except SFX) of enwik8 before and after the XML-WRT transform with option -f180 for several compressors. A ratio less than 1 means that XML-WRT improves compression.
Program Options enwik8 enwik8.xwrt Ratio Alg ------- ------- ----------- ---------- ------ --- paq8h -7 17,674,700 18,341,959 1.0378 CM ppmonstr J -o10 -m800 19,338,065 18,886,224 0.9766 PPM slim23d -m700 -o10 19,264,094 18,938,602 0.9830 PPM WinUDA 2.91 mode 3 (194 MB) 20,332,366 20,859,165 1.0259 CM ppmd J1 -o10 -m256 -r1 21,388,296 20,945,220 0.9793 PPM uhbc 1.0 -m3 -b100m 20,930,838 21,171,204 1.0115 BWT M03exp 32 MB 21,948,192 21,583,059 0.9834 BWT sbc -ad -m3 -b63 22,470,539 22,216,425 0.9887 BWT WinRAR 3.60b3 -mc7:128t+ -sfxWinCon.sfx 22,713,569 22,457,785 0.9887 PPM PX 1.0 24,971,871 22,818,070 0.9137 CM uharc 0.6b -mx -md32768 23,911,123 22,915,299 0.9583 PPM chile 0.3d-1 -b=40000 23,408,335 22,884,519 0.9776 BWT cabarc 1.00.0601 -m lzx:21 28,465,607 25,739,214 0.9042 LZ77 WinACE -sfx -m5 30,919,182 27,112,651 0.8769 bzip2 1.0.3 29,008,758 27,339,845 0.9425 BWT gzip 1.3.5 -9 36,445,248 30,403,738 0.8342 LZ77 pkzip 2.0.4 36,934,712 30,729,525 0.8432 LZ77 thor 0.9a ex 41,670,916 32,586,444 0.7820 compress 4.3d 45,763,941 38,485,494 0.8409 LZW Original size 100,000,000 52,174,989 0.5217
The -f option (default -f6) selects the minimum word frequency required to have it added to the dictionary. The optimal setting depends on the input size. When used with ppmd or ppmonstr (the best compressors improved by XML-WRT), the optimal settings are about -f180 for enwik8 and -f1800 for enwik9, which results in a dictionary of 7697 words for enwik8 and 6657 words for enwik9. The following table shows the effect of the -f and -o options for ppmonstr -m800 enwik9. The best combination found is -f1800 -o8.
-f -o7 -o8 -o9 -o10 -o11 -o12 -o16 -o32 --- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- 100 155,908,621 200 155,775,164 300 155,653,815 500 154,884,542 155,367,681 155,465,355 155,547,660 600 154,787,455 155,497,645 800 154,749,585 1000 154,909,136 154,794,501 154,951,751 155,122,278 155,306,526 155,409,926 155,948,066 157,901,320 1500 155,092,513 154,895,455 154,999,654 155,073,186 155,306,526 155,301,322 1800 155,191,178 154,924,936 155,036,534 155,066,074 155,366,281 155,297,828 2000 154,998,528 155,296,112 3000 155,379,959
The following table shows that the optimal setting for -f is lower for smaller files (with ppmd):
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp ------- ------- ---------- ----------- ----------- ----------- ----- ----- xml-wrt 1.0 -f1800 (70,826,140)(532,089,443) (14,818 s)(532,104,261) (115) (103) + ppmd J -m256 -o8 -r1 21,128,019 178,154,529 41,653 sx 178,196,182 712 723 xml-wrt 1.0 -f180 (52,174,989)(468,964,104) (14,818 s)(468,978,922) (113) (103) + ppmd J -m256 -o8 -r1 20,910,527 178,215,315 41,653 sx 178,256,968 690 699 ppmd J -m256 -o10 -r1 21,388,296 183,964,915 26,835 x 183,991,750 880 895
The default values of -s (disable spaces model) and -t (disable try smaller word) appear to work best on this data.
xml-wrt -f1800 enwik9 | ppmonstr -m800 -o12 ------------------------------------------- (default) 154,924,936 -s 155,040,558 -t 155,421,035 -s -t 155,542,575
xml-wrt 2.0 released June 14, 2006 (updated June 19, 2006) has additional transform options, and also includes LZ77 (zlib) and LZMA (LZ with arithmetic coding) compression. When used as a preprocessor, this compression is turned off. enwik9 was compressed using the options:
xml-wrt -l0 -w -s -c -b255 -m100 -e10000 enwik9 ppmonstr e -o8 -m800 enwik9.xwrt
The option -l0 turns off compression. -w turns off word containers. -s turns off space modeling (this hurts compression in version 1.0 but helps in 2.0). -c turns off word and number containers (independent of -w and -n. -n hurts compression). -b255 sets memory for the dictionary to 255 MB, the maximum. -m100 sets the memory buffer to 100 MB, which is not maximum (255 MB), but larger values hurt compression. -e10000 sets the dictionary size to 10000 words. (The dictionary size can also be controlled with -f as in version 1.0, but using -e is less dependent on input size so it helps with enwik8). Additional tests showing the effects of -e, -m, and -o:
xml-wrt 2.0 options ppmonstr J enwik9 -------------------------------- ---------- ----------- -l0 -w -s -c -b255 -m100 -e10000 | -m800 -o8 154,223,582 -l0 -w -s -c -b255 -m100 -e8000 | -m800 -o8 154,234,621 (smaller -e) -l0 -w -s -c -b255 -m100 -e12000 | -m800 -o8 154,239,769 (larger -e) -l0 -w -s -c -b255 -m50 -e10000 | -m800 -o8 154,259,117 (smaller -m) -l0 -w -s -c -b255 -m100 -e10000 | -m800 -o7 154,322,272 (smaller -o) -l0 -w -s -c -b255 -m150 -e10000 | -m800 -o8 154,426,554 (larger -m) -l0 -w -s -c -b255 -m100 -e10000 | -m800 -o9 154,445,811 (larger -o)
The optimal values of -w -c -s -n (turn off number containers) and -t (turn off try shorter words) was determined on enwik7 and enwik8 but not tested on enwik9.
A bug fix for LZMA compression, released June 19, 2006, does not change any values for the June 14, 2006 version (using the -l0 option). However the compressed source code increases from 25,290 bytes to 25,354 bytes. The June 14 version is no longer published. The URL is unchanged.
xml-wrt 3.0 (Sept. 14, 2006) option -3 means to optimize the default settings for PPM compressors. Version 3.0 also has a FastPAQ8 compressor for standalone compression which was tested separately.
xwrt 3.2 (see below) with ppmonstr J has the following results.
xwrt 3.2 options ppmonstr J opt enwik8 enwik9 program size total Comp Decomp Mem ---------------------- -------------- ---------- ----------- ----------------- ----------- -------- ------- ---- -2 -b255 -m255 -s -f64 -o10 -m1650 18,456,706 148,915,761 52,569s + 26,835x 148,995,165 475+2512 43+2503 1650 -2 -b255 -m255 -s -f64 -o64 -m1650 18,397,126 210+2810 50+2884 1527
ppmonstr option -o64 is optimal for enwik8, but -o10 is optimal for enwik9.
-m1650 selects 1650 MB memory.
xwrt option -2 optimizes for PPM. -b255 selects buffer size 255 MB for building
the dictionary. -m255 selects 255 MB memory buffer. -s turns off space modeling.
-f64 sets minimum word frequency for the dictionary to 64. Program size and
times are xwrt + ppmonstr. Memory usage is 512 MB for xwrt, 1650 MB for ppmonstr.
fp8 v1 (fast paq)
is a free, open source archiver by Jan Ondrus, May 2, 2010. It is derived
from pax8px_v68. It has fewer models than paq8px for better
speed but retains the models for wav, bmp, and jpg. The option -8 selects maximum memory.
fp8 v2,
Apr. 10, 2012, has some modeling improvements.
fp8 v3,
May 13, 2012, has some more compression improvements (at a slight cost in speed)
and a JPEG bug fix.
tangelo 1.0, June 17, 2013, is a single-file compressor based on fp8. It removes
specialied models and preprocessors for exe, bmp, wav and jpeg types. It takes no
options. It uses fixed memory of 567 MB, equivalent to fp8 -7.
tangelo 2.0, July 6, 2013, removed some models and made other
simplifications for better speed and less memory but worse compression.
tangelo 2.1, July 20, 2013, faster with less compression.
tangelo 2.3, July 22, 2013, re-added APM for better compression, and minor
changes for better speed.
WinRK 3.0.3 is a commercial
GUI archiver by Malcolm Taylor
(Mar. 6, 2006). It is top ranked on some benchmarks.
Unfortunately it is not available for free download (as of May 16, 2006). The
"free trial" expires as soon as you install it.
(Update, Sept. 11, 2006: versions 3.0.2 and 3.0.3 are no longer available for download.
They appear to have been withdrawn last month).
WinRK in PWCM mode (Paq Weighted Context
Modeling) is based on the paq7/8 algorithm with text dictionary preprocessing
and specialized models for wav, bmp, and exe files. Version 3.0.2 was based on
the earlier paq6 algorithm which uses adaptive linear model mixing rather than
a neural network which mixes bitwise predictions from models
in the logistic (log p/(1-p)) domain. The +td and -td options turns English dictionary
preprocessing on or off respectively. 800MB selects the memory limit. When not
specified, PWCM appears to allocate all available memory except leaving 8 MB.
RK and RKC are predecessors of WinRK so I don't plan to test them.
ppmonstr, ppmd, and ppms var. J are
free command line file compressors by Dmitry Shkarin (model) and
Dmitry Subbotin (range coder), Feb. 16, 2006. (ppms on Feb. 21, 2006).
ppmonstr is a slower, experimental version of ppmd with better compression.
Source code is available for ppms and ppmd but not ppmonstr.
ppms is a small memory (1 MB) version of ppmd.
They all use PPMII (PPM with information inheritance). The -m256
option selects 256 MB memory (maximum for ppmd). The -o10 option selects
PPM order 10. (Higher orders use up memory faster which hurts
compression). When ppmd runs out of memory, it discards the
model and starts over. The -r1 option (default in ppmonstr)
tells ppmd to back up and partially rebuild the model before resuming compression.
The default options for ppmd are -m10 -o4 -r0 which are designed for reasonably
good compression with high speed and low memory usage (see table below).
ppms accepts only options -o2 through -o8. The default is -o5. This also gives
the best compression on enwik8. Task Manager shows 1.8 MB memory used.
ppmd was updated to J1 on May 10, 2006 to fix a bug. Compression benchmarks are unchanged
except the size of the compressor (11,099 bytes as zipped source code).
ppmonstr is unchanged.
zcm v0.01
(discussion)
is a free, experimental, closed source compressor for 32 bit Windows
by Nania Francesco Antonio,
Dec. 16, 2011. It uses context mixing. Commands c1 through c7 select memory
usage for compression. Decompression uses the same memory. c7 uses the most
memory and gets the best compression.
zcm v0.02 was released Dec. 23, 2011.
zcm v0.03 was released Dec. 28, 2011.
zcm v0.04 was released Jan. 30, 2012. (Program banner says v0.03).
zcm v0.11
was released Feb. 19, 2012.
It is described as mixing 6 contexts. It detect file type and uses exe, delta,
and LZP preprocessors. It has separate models for text and binary data.
Speed and memory usage are the same for compression and
decompression. Commands c0 through c7 select memory usage. Each increment
doubles memory, resulting in better compression.
Memory is used slowly as the program runs up to a maximum value which is not reached on
enwik8 for c5 and higher. For enwik8, c7 uses 1286 MB rather than 1716 MB.
zcm 0.20b was released
Apr. 4, 2012. It is an archiver rather than a single file compressor.
Option -m7 selects maximum memory usage (range 32 MB to 1.7 GB).
zcm 0.30 was released May 2, 2012.
zcm 0.40 was released May 16, 2012. It is described as using CM with
6 contexts, a mixer, and one re-mixer (APM or SSE) to adjust the mixer
output. It uses LZP preprocessing.
zcm 0.50a
was released June 2, 2012.
zcm 0.60d
adds multithreading and other improvements. The -t option selects the
number of tasks. -t0 auto-detects the number of cores, which is equivalent
to -t2 on the dual core test machine (T3200, 3 GB). The default is -t1.
The -m option selects memory usage from -m1 (46 MB per task)
to -m7 (1.6 GB per task). The default is -m4. Parallel compression is
performed by separate processes that can independently access 2 GB of memory
each in 32 bit Windows. When run with -t2, there is also a third task using
5 MB of memory. All three tasks saturate one CPU core each.
It was found that -t2 makes compression worse (probably by splitting the
input in half and compressing each separately) and is not much faster than
-t1. The -t option can also be given during extraction. If the archive
was compressed with -t2 then extraction with -t2 doubles memory usage
but only improves speed slightly. If compressed with -t1 then extraction
with -t2 is 4 seconds slower for enwik8 than with -t1 because the extra task exits
immediately and the third 5 MB task continues to run.
zcm 0.70b
was released Oct. 14, 2012.
zcm 0.80
was released May 15, 2013. It was tested in Linux under Wine.
When -t2 was used to compress in 2 threads, it was also used to extract.
zcm 0.88
(discussion) was released June 21, 2013. It was tested both in Windows and
in Linux under wine.
zcm 0.90 was released May 3, 2014.
zcm 0.92 was released May 16, 2014.
A
64 bit Windows version was released July 3, 2014. It supports the undocumented -m8 option
using up to 3 GB memory.
zcm 0.93 was released May 12, 2015.
slim 23d is a free, closed source command line
archiver by Serge Voskoboynikov, Sept 21, 2004. It uses a PPMII core
(ppmd/ppmonstr) by Dmitry Shkarin with filters for special file types including text.
The -m700 option selects 700 MB of memory. (I found -m800 causes
disk thrashing at 1 GB). The -o10 option selects order 10 PPM. (-o12 and -o16
caused slim to fail on enwik9, creating an empty archive and exiting after about 60% completion with 1 GB.
Smaller files were OK. There was no error with 2 GB).
As with other PPM compressors (ppmd, ppmonstr), using a higher order improves
compression but consumes memory faster. For enwik8, -o32 is optimal with 700MB available,
but lower orders are better for enwik9.
bwmonstr 0.01 was released Mar. 18, 2009.
bwmonstr 0.02 was released July 8, 2009. It uses a compressed representation internally,
thus memory usage is less than the 1 GB block size. It compresses the entire input file in
a single block and requires enough memory to hold the file. The program is multi-threaded
even on a single block. Times shown are for a single core processor, but would be faster on
a multi-core processor.
reorder2 is an alphabet reordering program by Eugene Shelwien.
drt is the dictionary preprocessor from lpaq9m by Alexander Rhatushnyak
nanozipltcb is a free file compressor
by Sami Runsas, July 25, 2008. It uses BWT. It takes no options. It is a customized version of
nanozip, similar to -cO -txt -m1700m, but
tuned to this benchmark. Files compressed with
nanozipltcb are not compatible with nanozip.
nanozipltcb 0.08,
Mar. 3, 2010. is multithreaded and has other optimizations. Size is based
on a self extracting archive. Only a 64 bit Windows version
exists. Tested by the author on a quad core Q6600 at 3.0 GHz.
The older version is withdrawn.
nanozipltcb 0.09, was relased
May 10, 2010. It has only a 64 bit Linux executable version.
M99
(mirror) is a free
file compressor by Michael Maniscalco, originally written in 1999 and ported
to Windows on Mar. 27, 2007. It uses BWT, based on MSufSort 3.1.
M99 is a predecessor to M03. Command line is:
Version 2.1 was released Apr. 19, 2007.
M99 2.2.1,
released July 18, 2008,
has an optimization to compress the contents of TAR files separately. For other files,
it increases the size by 1 byte.
M03 v0.2a,
Oct. 10, 2009,
takes just one option, which is the block size in bytes. Memory usage is 6x
block size for compression and 5x for decompression.
M03 v1.1 beta
was released Oct. 24, 2011 for 64 bit Windows.
It includes some new, fully parallel
suffix sorting and BWT construction algorithms. The option 1000000000
specifies a single block requiring 5 GB memory to compress or decompress.
bcm
0.03
(discussion) is a free
command line compressor by Ilia Muraviev, Feb. 9, 2009. It uses BWT with a fixed
block size of 32 MB and an order 0 CM back end. It takes no command line options.
bcm 0.04
(discusion) was released
Feb. 11, 2009. It increases the block size to 64 MB and has modeling improvements
including interpolated SSE.
bcm 0.05
(discussion)
was released Mar. 5, 2009. The option -b327680 selects 327680 KB block size. It uses
5x block size memory.
bcm 0.07
(discussion)
was released Mar. 15, 2009.
bcm 0.08
(discussion)
was released May 31, 2009. The command e370 means to use a block size of 370 MB.
Memory usage is 5 times block size. Larger values gave an "out of memory" error
under 32 bit Windows Vista with 3 GB memory.
reorder v2
(discussion)
is an alphabet reordering preprocessor for BWT compressors by Eugene Shelwien,
May 26, 2009.
xlt
is a pair of 256 byte files that defines the alphabet permutation used
by reorder, released June 4, 2009 by Eugene Shelwien.
bcm 0.09
(discussion)
was released Aug. 19, 2009. Option -b328 selects a block size of 328 MB. Memory usage is
5 times block size for both compression and decompression.
bcm 0.10 x64
x86
was released Dec. 11, 2009.
Discussion
The x64 version is for 64 bit Windows. The x86 version is for 32 bit Windows.
The -b option gives the block size in MB. Memory usage is 5x block size.
bcm 0.11
(discussion)
was released June 22, 2010. It is described as a complete rewrite.
bcm 0.12
(discussion)
was released Oct. 31, 2010. A 64 bit version was tested by the author
with -b1000 on June 1, 2011.
bcm 0.14
(discussion)
was released June 22, 2013. Only a 64 bit Windows version was released. Command c1000 means to compress
in 1000 MB blocks.
bcm 1.00
(discussion) was released
as open source (public domain), Mar. 2, 2015. It was tested by compiling
with g++ 4.8.2 -O3 in Linux.
bcm 2.03 was released Feb. 27, 2023
as a Windows .exe only. Option -b1000x- selects 1000 MB block size and no x86
preprocessing. Timing on my system (97) uses 1 of 8 threads.
tree 0.1 is a free, experimental, open source compressor by
Kennon Conrad, Mar. 31, 2014. It is a general purpose compressor
optimized to compress text. The compressor is
3 separate programs. The first, TreeCapEncode.c, converts upper case
letters to lower case plus special symbols. It takes 4 minutes.
The second, TreeCompress.c,
uses a suffix tree to parse the input into tokens.
It takes 3 days, 21 hours, 37 minutes and uses 1850 MB memory.
The third, TreeBitEncode.c
encodes the tokens using variable length codes. This takes
27 seconds. The decoder, TreeDecode.c, takes 22 seconds using
400 MB memory. Compressed size depends on available memory; thus
results below are machine dependent.
tree 0.3 was released Apr. 27, 2014. It uses a model that only parses
whole words with a leading space.
tree 0.4 was release May 21, 2014.
tree 0.5 was released May 25, 2014.
tree 0.9 was released July 5, 2014. It includes a multi-threaded
decompression program for better speed. TreeCapEncode.c is now TreePreEncode.c
and run in 11 seconds.
tree 0.10 was released Aug. 15, 2014. Timings for each step are:
TreePreEncode 20 s, TreeParagraphs 1485 s, TreeWords 393 s, TreeCompress 70732 s,
TreeBitEncode 33 s, total 72663 s.
tree 0.11 was released Sept. 2, 2014. It uses extra symbol tables to improve
compression ratio and decompression speed.
tree 0.12 was released Sept. 29, 2014 with a bug fix on Oct. 1, 2014.
For note 48, the program was compiled with gcc 4.8.2 -O3.
tree 0.13 was released on Oct. 12, 2014. There is a 32 bit version that uses
1700 MB memory and a 64 bit version of TreeCompress.exe that uses 6x the input
size in memory. The option (P+W+C) means that the two preprocessing stages
TreeParagraph.exe and TreeWords.exe (same for 32 and 64 bit) were run on the input
prior to TreeCompress.exe or TreeCompress64.exe. Otherwise only the last stage
is run. The preprocessing stages make compression worse but faster.
tree 0.14 was released Oct. 29, 2014. The 64 bit version was tested.
tree 0.15 was released Nov. 21, 2014. 0.15a, Nov. 22, 2014,
has a faster decompressor.
tree 0.16b was released Dec. 9, 2014.
tree 0.17 was released Dec. 16, 2014. Compression times an memory
usage are approximate (unchanged since last version).
tree 0.18 was released Jan. 17, 2005 with improvements to the 64 bit
version. The -r option controls memory usage.
tree 0.19 was released Feb. 4, 2015.
glza 0.1 is the new name of the tree program, released Apr. 27, 2015.
It uses adaptive order 0 arithmetic coding of dictionary symbols and other changes.
glza 0.2 was released May 24, 2015.
glza 0.3 was released July 13, 2015. Decompression requires 330 MB memory.
glza 0.3b was released Nov. 16, 2015. It contains the same files
as v0.3a (a bug fix for v0.3) except that it also contains GLZAcompressFast (.c and .exe),
which was tested below.
glza 0.4 was released Mar. 11, 2016.
glza 0.8 was released Sept. 27, 2016. The option -p3 selects a factor to favor
longer strings over more compressive.
glza 0.10.1 was released Jan. 6, 2018.
bsc
1.00 x86
x64
is a free, experimental file compressor by Ilya Grebnov, Apr. 7, 2010. It uses BWT with LZP
preprocessing. The option -b1000t selects a block size of 1000 MB and turns off multithreading
(parallel compression on multiple cores). Memory requirements is 6x block size times number of
threads. Multithreading was turned off (-t) for both compression and decompression in order
to maximize compression. Nevertheless, compression shows CPU utilization of 109% on 2 cores
even with -t set. -p turns off LZP preprocessing. -m2 selects a sort (Schindler) transform
of order 5.
Other options select LZP table size (default 218 bytes, range 10..28), LZP
match length (default 128, range 4..255), block sorting algorithm (default BWT, possible
order 4 or 5 sort (Schindler) transform), and preceding or following context for sorting
(default following). Only the defaults were tested, which may not be optimal. There
are two versions: x86 for 32 bit Windows with a 2 GB memory limit, and x64 for 64 bit
Windows with no memory limit. Notes apply to enwik9. enwik8 size is tested as in note 26.
bsc 1.03 x86 and
x64
(discussion),
Apr. 11, 2010,
are bug fixes that do not change results except for the size of the program.
The x64 version is 276,292 bytes.
bsc 2.00, May 3, 2010,
is available with source code licensed under LGPL.
bsc 2.20, June 15, 2010, has speed improvements for multi-core support.
-b1000p means use 1000 MB
block size (-b1000, requires 5 GB memory) with no preprocessing (-p).
-b80p uses 80 MB block size with no preprocessing. -m2f means use
sort transform order 5 (-m2) and fast compression (-f).
enwik8 was tested as in note 26 on bsc-x32 replacing -b1000p with -b100p.
bsc 2.26, July 26, 2010, has some speed improvements but retains compatibility
with version 2.25. -b328 selects a block size of 328 MB, which divides enwik9
into 3 blocks. This is the fewest number of blocks supported by the x86 version
because of a 2 GB process limit. The x64 version does not have this limit but
requires 64 bit Windows. -t disables parallel block processing, which would double the
memory requirement. -T disables all multicore processing. This gives a smaller compressed
size but is slower than -t. -T or -t must be specified during decompression
to prevent an out of memory error. With -t, CPU usage is 156% for compression and 129%
for decompression on a dual core T3200 (2 GHz, 3 GB, Vista 32 bit).
bsc 2.4.5, Jan. 3, 2011, improves the speed of decompression. It remains
compatible with the previous version.
bsc 2.5.0, Mar. 20, 2011, had no significant changes for the tests performed.
Minor performance enhancements. CRC32 is replaced with Adler32.
bsc 3.0.0, Aug. 27, 2011 adds experimental NVIDEA (CUDA) GPU acceleration for
forward sort transforms ST5 through ST8. ST7 and ST8 are GPU only. There are
32 and 64 bit versions. For the test shown, the 64 bit version was used.
-b32 means to select 32 MB block
size, -p disables preprocessing, -m8 selects order 8 sort transform, and
-f selects fast compression. The test machine is a Core-i7 2600K (4 cores,
8 threads, 8 MB cache) overclocked from 3.4 GHz
to 4.6 GHz, with a 384 CUDA processor GeForce 560Ti GPU, overclocked from 822
MHz to 900 MHZ, with 2000 MHz memory speed.
Compression takes 8.705 seconds using 1129 MB CPU memory
and about 1 GB GPU memory. Decompression uses only the CPU, taking 18.595 seconds
using 1395 MB memory.
bsc 3.1.0
was released July 8, 2012.
bsc 3.25
discussion
was released Nov. 28, 2022. It has the same compression but improved performance.
Main contribution is new fast linear time suffix array and Burrows-Wheeler transform construction
library.
.1532 fp8_v3
Program options enwik8 enwik9 program size total Comp Decomp Mem Alg Note
------- ------- ---------- ---------- ------------ ----------- ----- ------ ---- --- ----
fp8 v1 -8 18,573,126 49,865 s 20010 1150 CM 26
fp8 v2 -8 18,556,327 154,359,664 49,964 s 154,409,626 19059 21196 1192 CM 26
fp8 v3 -8 18,438,169 153,188,176 50,068 s 153,238,244 20605 22593 1192 CM 26
tangelo 1.0 18,593,738 156,355,536 8,365 s 156,363,901 19849 19977 567 CM 26
tangelo 2.0 20,202,547 171,678,313 6,275 s 171,684,588 6028 6007 362 CM 26
tangelo 2.1 21,021,150 179,879,607 11,320 s 179,890,927 2275 2262 361 CM 26
tangelo 2.3 20,921,619 178,497,116 11,687 s 178,508,803 2172 2194 361 CM 26
.1563 WinRK
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- --- -----
WinRK 3.03 PWCM (800MB +td) 18,612,453 156,291,924 3,017,362 x 159,309,286 68555 CM 10
WinRK 3.03 PWCM 18,612,551 156,349,910 3,017,362 x 159,367,272 102973~90000 CM 9
WinRK 3.03 FPW1 (800MB +td) 19,035,564 24950 10
WinRK 3.03 PWCM (800MB -td) 19,060,620 88310 CM 10
WinRK 3.03 Efficient 21,157,165 5380 PPM 10
WinRK 3.03 Normal (PPMd) 22,322,981 620 PPM 10
WinRK 3.03 PWCM (800MB +td) 18,612,453 156,291,924 99,665 xd 156,391,589 68555 800 CM 10
WinRK 3.03 x64 PWCM (2047MB +td o28) 18,101,637 150,481,300 3053 CM 42
.1570 ppmonstr, ppmd, ppms
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ----
ppmonstr J -m1700 -o16 19,055,092 157,007,383 42,019 x 157,049,402 3574 ~3600
ppmonstr J -m800 -o16 19,230,657 161,496,685 42,019 x 161,538,704 3783 ~3800
ppmonstr J -m1863 -o16 19,040,451 156,578,769 42,019 x 156,620,788 42
ppmd J -m256 -o10 -r1 21,388,296 183,964,915 11,099 s 183,976,014 880 895
ppmd J -m10 -o4 -r0 26,275,353 236,509,791 11,099 s 236,520,890 194 206
ppms J -o5 26,310,248 233,442,414 16,467 x 233,458,881 330 354
-o2 36,866,748 102
-o3 30,242,535 135
-o4 27,030,761 246
-o6 26,644,863 449
-o7 27,028,318 492
-o8 27,343,283 532
.1593 zcm
Program Option enwik8 enwik9 Prog Total Comp Deco Mem Note
--------- ------ ---------- ----------- -------- --------- ---- ---- ---- ----
zcm v0.01 c1 23,914,413 2260 2730 35 26
c7 20,093,284 169,397,795 47,975 x 169,445,770 2965 2883 1486 26
zcm v0.02 c7 20,277,130 170,848,574 2419 2396 1470 26
zcm v0.03 c7 20,159,212 169,368,119 27,589 x 169,395,708 2416 2369 1476 26
zcm v0.04 c7 20,853,133 173,956,638 27,731 x 173,984,369 1462 1459 1520 26
zcm v0.11 c0 23,963,073 1230 1210 22 26
c1 22,937,669 1280 35 26
c2 22,076,074 1290 62 26
c3 21,362,445 1330 115 26
c4 20,810,077 1370 222 26
c5 20,447,150 1390 401 26
c6 20,215,116 1400 697 26
c7 20,078,151 165,518,908 31,576 x 165,550,484 1275 1190 1716 26
zcm 0.20b -m7 20,204,267 167,177,534 161,122 x 167,338,656 1199 1204 1657 26
zcm 0.30 -m7 20,237,368 167,198,948 161,558 x 167,360,506 949 970 1720 26
zcm 0.40 -m7 20,200,819 167,138,719 161,502 x 167,300,221 904 929 1511 26
zcm 0.50a -m7 19,966,605 164,661,654 161,614 x 164,823,268 947 971 1579 26
zcm 0.60d -m7 -t1 19,786,363 162,731,120 171,517 x 162,902,637 915 960 1662 26
-m1 -t1 23,374,636 890 920 46 26
-m1 -t2 23,440,140 830 910 97 26
-m4 -t1 20,698,415 950 1000 226 26
-m4 -t2 20,925,875 940 990 389 26
-m6 -t1 19,933,151 1030 1050 651 26
-m6 -t2 20,359,596 1070 990 1160 26
-m7 -t2 20,267,309 2080 1130 2450 26
zcm 0.70b -m7 -t1 20,065,306 166,373,795 159,493 x 166,532,988 870 884 1412 26
zcm 0.80 -m7 -t1 19,937,741 164,724,585 110,565 x 164,835,150 552 557 1700 48
-m7 -t2 20,554,326 166,468,556 110,565 x 166,579,121 414 415 1990 48
zcm 0.88 -t1 21,383,928 940 930 196 26
-t2 25,767,005 800 820 120 26
-m7 -t2 20,418,171 1110 890 1400 26
-m7 -t1 19,970,859 164,702,310 162,136 x 164,864,446 910 891 1434 26
-m7 -t1 19,970,859 164,702,310 162,136 x 164,864,446 546 527 1434 48
zcm 0.90 -m7 -t1 20,006,179 165,266,797 164,361 x 165,431,158 511 516 ~1700 48
zcm 0.92 -m7 -t1 19,803,545 163,246,657 166,763 x 163,413,420 500 512 1546 48
zcm_x64 0.92 -m7 -t1 19,803,545 163,246,657 225,205 x 163,471,862 488 471 1549 48
-m8 -t1 19,700,970 160,848,578 225,205 x 161,073,783 489 474 2400 48
zcm 0.93 -m8 -t1 19,572,089 159,135,549 227,659 x 159,363,208 421 411 3100 48
.1598 slim
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp
------- ------- ---------- ----------- ----------- ----------- ----- -----
slim23d -m1700 -o12 19,077,276 159,772,839 69,453 x 159,842,292 5232 ~5400
slim23d -m700 -o32 19,226,339 (failed) 69,453 x 6530 6770
slim23d -m700 -o10 19,264,094 162,529,098 69,453 x 162,598,551 5175 5360
.1603 bsc-m03
bsc-m03
(discussion)
is a free, experimental,
open source (GPL v3) file compressor by Ilya Grebnov, November 20, 2022.
It is a practical implementation of Compression via Substring Enumeration
using Burrows-Wheeler transform and M03 context aware compression algorithm.
Its purpose is to have the highest compression ratio among BWT based compressors
without using preprocessing. The compressor only uses a single thread.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ----- --- ---
bsc-m03 v0.4.0 -b1000000000 20,293,393 160,258,936 105,456 xd 160,364,392 160 135 13000 BWT 96
.1605 bwmonstr
bwmonstr 0.00 is a free, experimental,
closed source
file compressor by Sami Runsas, Mar. 10, 2009. It uses BWT. The program takes no
options. It loads the input file into a single block and allocates 1.25 times the
block size in memory for either compression or decompression.
Thus, it is able to transform enwik9 in a single block.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- --- ----
bwmonstr 0.00 20,401,888 161,249,951 27,772 x 161,277,723 15638 13028 1224 BWT 26
bwmonstr 0.01 20,379,365 161,026,258 32,163 x 161,058,420 15695 14135 1224 BWT 26
bwmonstr 0.02 20,307,295 160,468,597 69,401 x 160,537,998 331801 156147 590 BWT 30
reorder2|bwmonstr 0.02 20,229,555 590 BWT 30
drt|bwmonstr 0.02 19,750,461 450 BWT 30
.1617 nanozipltcb
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ---------------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
nanozip 0.01 -cO -m1670m -txt 20,306,489 167,509,921 266,797 x 167,776,718 403 284 1325 BWT
nanozipltcb 20,494,670 166,251,135 239,124 x 166,490,259 348 185 1729 BWT
nanozipltcb 0.08 20,626,962 166,571,051 0 xd 166,571,051 93 53 1729 BWT 37
nanozipltcb 0.09 20,537,902 161,581,290 133,784 x 161,715,074 64 30 3350 BWT 40
.1637 M03
M99.exe e|d -switches blocksize input output
switches are:
-r = post BWT run length encoding
-a = arithmetic coding instead of M99 style bit packing
-f = fast mode
-m = max compression mode (implies -a).
Blocksize can be specified in bytes (like 10000), kb, mb etc as 100m or 100k.
Memory requirement for compression is 6 times the blocksize maximum, although in most cases only
a little over 5 times blocksize is used. Blocksize 239m divides enwik9 into 4 approximately
equal parts and requires about 1500 MB memory.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
M99 e -m 239m 21,431,211 180,477,144 67,697 x 180,544,841 674 496 1500 BWT
M99 v2.1 e -m 239m 21,251,170 178,910,174 68,052 x 178,978,226 713 535 1500 BWT
M99 v2.2.1 e -m 239m 21,251,171 178,910,175 72,245 x 178,982,420 704 520 1500 BWT
M03 0.2a e 250000000 20,713,383 173,944,553 95,699 x 174,040,252 868 624 1470 BWT 26
M03 1.1b e 1000000000 20,710,197 163,667,431 50,468 x 163,717,899 457 406 5735 BWT 52
.1637 bcm
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
bcm 0.03 22,007,655 192,194,478 67,988 x 192,262,466 517 437 164 BWT 26
bcm 0.04 21,450,604 185,368,446 69,553 x 185,455,999 578 486 329 BWT 26
bcm 0.05 -b327680 20,770,671 172,180,796 69,040 x 172,249,836 684 535 1642 BWT 26
-b406991 171,857,720 69,040 x 171,926,760 2030 BWT 27
bcm 0.07 -b327680 20,770,673 172,180,037 60,990 x 172,241,027 818 578 1642 BWT 26
-b488282 169,396,680 60,990 x 169,457,670 472 341 2440 BWT 28
bcm 0.08 e370 20,744,613 171,891,509 61,666 x 171,953,175 948 709 1900 BWT 26
e477 20,744,613 169,179,098 61,666 x 169,232,764 545 418 2385 BWT 28
reorder_v2|bcm 0.08 e477 20,677,205 168,694,909 80,149 x 168,775,058 548 422 2385 BWT 28
reorder_V2|bcm 0.08 e477 xlt 20,665,536 168,598,121 80,661 x 168,678,782 552 420 2385 BWT 28
bcm 0.09 -b328 20,625,697 170,913,486 63,704 x 170,977,190 1342 1053 1652 BWT 26
bcm 0.10 x86 -b370 20,811,710 172,570,245 63,788 x 172,634,033 758 483 1899 BWT 26
bcm 0.10 x64 -b512 169,871,532 72,366 x 169,943,898 362 2560 BWT 35
bcm 0.10 x64 -b477 169,843,006 72,366 x 169,915,372 522 373 2500 BWT 36
bcm 0.11 -b328 20,773,468 172,267,889 70,936 x 172,338,825 798 548 1552 BWT 26
-b477 20,773,468 169,466,640 70,936 x 169,537,576 611 423 2500 BWT 43
bcm 0.12 -b328 20,825,972 172,665,135 61,874 x 172,727,009 637 414 1683 BWT 26
-b1000 20,825,972 164,654,285 61,974 x 164,716,259 281 214 5000 BWT 50
bcm 0.14 c1000 20,736,614 163,885,873 74,569 x 163,960,442 162 153 5000 BWT 60
bcm 1.00 -b500 20,792,796 169,489,509 15,187 s 169,504,696 251 250 2500 BWT 48
-b1000 20,792,796 164,251,284 15,187 s 164,266,471 147 142 5000 BWT 60
bcm 2.03 -b1000x- 20,738,630 163,646,387 125,866 x 163,772,253 106 67 4096 BWT 97
163,646,387 62 34 98
.1638 glza
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
tree 0.1 187,985,256 6,656 sd 187,985,256 337287 22 1850 Dict 64
23,660,364 187,933,399 6,656 sd 187,940,055 174589 12 1850 Dict 65
tree 0.2 23,250,856 185,311,980 337287 22 1850 Dict 64
tree 0.3 23,233,932 184,838,711 6,591 sd 184,845,302 105728 23 1850 Dict 64
tree 0.4 23,178,500 184,312,072 7,216 sd 184,319,288 68866 22 1850 Dict 64
tree 0.5 23,084,884 181,375,076 8,271 sd 181,383,347 68869 22 1850 Dict 64
tree 0.9 22,366,748 181,324,992 7,104 sd 181,332,096 70723 15 1850 Dict 64
tree 0.10 22,072,432 178,949,848 12,174 sd 178,962,022 72663 18 1850 Dict 64
tree 0.11 22,076,556 178,773,808 9,645 sd 178,783,453 72659 13 1850 Dict 64
22,076,556 178,782,844 9,645 sd 178,792,489 36765 8 1750 Dict 67
tree 0.12 21,974,704 177,542,704 10,121 sd 177,552,493 226279 17 1800 Dict 48
177,321,380 7 Dict 65
tree 0.13 21,976,316 177,340,072 10,525 sd 177,350,597 116473 7.5 1700 Dict 67
(P+W+C) 22,075,700 178,774,864 10,525 sd 178,785,389 36652 7.3 1700 Dict 67
tree64 0.13 22,196,288 180,516,660 10,525 sd 180,516,660 39417 7.4 6000 Dict 67
(P+W+C) 22,304,524 180,941,504 10,525 sd 180,952,029 20834 7.4 6000 Dict 67
tree64 0.14 22,124,900 178,839,408 10,525 sd 178,849,933 9364 7.5 5100 Dict 67
(P+W+C) 22,229,468 179,806,072 10,525 sd 179,816,597 6899 7.4 3800 Dict 67
tree 0.15a 21,922,356 176,896,672 11,203 sd 176,907,875 114733 6.9 1800 Dict 67
(P+W+C) 22,023,144 178,321,588 11,203 sd 178,332,791 36828 6.6 1800 Dict 67
tree64 0.15a 22,140,724 177,974,208 11,203 sd 177,985,411 9542 7.0 5200 Dict 67
(P+W+C) 22,155,772 178,874,272 11,203 sd 178,885,475 7269 6.8 3900 Dict 67
tree 0.16b 21,602,648 173,848,464 13,395 sd 173,861,859 114739 8.1 1693 Dict 67
tree64 174,825,152 13,395 sd 174,838,547 9362 8.2 5002 Dict 67
tree 0.17 21,564,704 173,461,100 13,563 sd 173,474,663 115000 7.1 1700 Dict 67
tree64 21,772,096 174,399,062 13,563 sd 174,412,625 9400 7.2 5000 Dict 67
tree64 0.18 21,639,204 174,357,336 13,463 sd 174,370,799 4901 7.2 6009 Dict 67
-r100 173,856,720 13,463 sd 173,870,183 14965 7.2 15370 Dict 67
tree 0.19 21,497,672 173,210,648 14,625 sd 173,225,273 119742 7.1 1692 Dict 67
tree64 0.19 21,547,924 173,803,292 14,625 sd 173,817,917 4702 7.2 6023 Dict 67
glza 0.1 21,225,310 171,131,068 14,391 sd 171,145,459 4716 12.5 6027 Dict 67
glza 0.2 20,806,740 167,274,338 15,218 sd 167,289,556 4713 16.4 6027 Dict 67
glza 0.3 20,541,988 165,419,346 18,982 sd 165,438,328 4156 14.9 6026 Dict 67
glza 0.3b GLZAcompressFast 22,021,734 180,243,710 18,982 sd 180,262,692 786 15 11395 Dict 67
glza 0.4 20,497,514 165,117,094 20,056 sd 165,137,150 5971 14.9 6024 Dict 67
glza 0.8 20,472,828 164,943,294 64,327 sd 165,007,621 9328 15.8 12673 Dict 67
-p3 20,442,490 164,634,038 64,327 sd 164,698,365 10106 15.8 12369 Dict 67
glza 0.10.1 20,753,713 167,832,309 69,935 s 167,902,244 594 11.9 7452 Dict 67
-x -p3 20,356,097 163,768,203 69,935 s 163,838,138 8184 11.9 8205 Dict 67
.1639 bsc
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- --- ----
bsc-x64 1.00 -b1024 20,769,550 163,820,253 274,197 x 164,094,450 311 212 6000 BWT 34
bsc-x64 1.00 -b1000 -p 20,787,437 163,882,152 274,197 x 164,156,349 271 209 6000 BWT 38
bsc-x86 1.00 -b250 -t 20,769,550 174,337,692 258,824 x 174,596,616 473 276 1504 BWT 28
bsc-x64 1.00 -b79 -p -m2 22,864,952 200,607,811 274,197 x 200,882,008 36 71 1896 ST5 39
bsc-x86 1.03 -b250 -t 20,769,550 174,337,692 261,058 x 174,598,750 470 280 1504 BWT 28
bsc-x64 2.00 -b1000p 20,789,147 163,888,465 122,581 s 164,011,046 237 199 5095 BWT 39
bsc-x64 2.20 -b1000p 20,789,228 163,888,858 149,153 s 164,038,011 238 93 5095 BWT 39
-b80p -m2f 23,031,164 201,321,919 149,153 s 201,471,072 27 68 1624 ST5 39
bsc-x86 2.26 -b328 -t 20,774,446 171,826,969 138,293 s 171,965,262 386 183 1667 BWT 28
-b328 -T 20,772,543 171,820,075 138,293 s 171,958,368 438 274 1663 BWT 28
bsc-x86 2.45 -b328 -t 20,774,446 171,826,969 130,327 s 171,957,296 382 141 1667 BWT 28
-b328 -T 20,772,543 171,820,075 130,327 s 171,950,402 443 195 1667 BWT 28
bsc-x86 2.50 -b328 -t 20,774,446 171,826,969 129,593 s 171,956,562 398 139 1670 BWT 28
-b328 -T 20,772,543 171,820,075 129,593 s 171,949,668 444 195 1670 BWT 28
bsc-x64 3.00 -b32p -m8f 22,461,680 196,398,933 934,176 x 197,333,109 8 18 3129 ST8 51
bsc-x86 3.10 -b328 -T 20,920,018 173,026,090 241,476 s 173,267,566 390 149 1712 BWT 28
bsc 3.25 -b1000 -e2 20,786,794 163,884,462 74,297 xd 163,958,759 23 8 5000 BWT 96
.1640 bbb
bbb ver. 1
is a free, open source (GPL) command line file compressor by Matt Mahoney, Aug. 31, 2006.
It uses a memory efficient BWT allowing blocks up to 80% of available memory.
The transformed data is compressed with an order 0 PAQ like model: the previous
bits of the current byte are mapped first to a bit history, then through a 6 level
probability correcting adaptive chain before bitwise arithmetic coding.
The m1000 command selects 1000 MB block size. Thus, enwik9 is suffix sorted in one block. This is accomplished by sorting 16 smaller blocks, writing the pointers to 4 GB of temporary files, and merging them. The inverse transform is done in memory without building a linked list. Rather, the next position is found by looking up the approximate location in an index of size n/16 and finding the exact location by linear search.
bbb.exe Win32 executable compiled with MinGW g++ 3.4.2 and UPX 1.24w.
g++ -Wall -O2 -Os -march=pentiumpro -fomit-frame-pointer -s -o bbb.exe upx bbb.exe
bbb Linux executable, supplied by Phil Carmody (Aug. 31, 2006). Compiled with g++-4.1 -Wall -O2 -o bbb bbb.cpp; strip bbb
bbb has a faster mode for both compression and decompression that does a "normal" BWT using 5x blocksize in memory. Output format is the same for fast and slow mode for both compression and decompression. A file compressed in fast mode can be decompressed in slow mode on another computer with less memory, and vice versa. The mode has no effect on the compressed file contents.
Recommended usage for best compression: For files smaller than 20% of available memory, use fast mode and one block. For example, if you have 1 GB memory (800 MB available under Windows) and foo is 100 MB:
bbb cfm100 foo foo.bbb (c = compress, f = fast, m100 = 100 MB blocks) bbb df foo.bbb foo.out (d = decompress, f = fast)If the file is 20% to 80% of available memory, use one block in slow mode. If foo is 500 MB:
bbb cm500 foo foo.bbb bbb d foo.bbb foo.outIf the file is over 80% of memory, use 80% of memory as the block size in slow mode. If foo is 1 GB:
bbb cm640 foo foo.bbb bbb d foo.bbb foo.outThe model requires about an additional 6 MB that should be subtracted from available memory.
bbb results by block size are shown below. Gain is the compression improvement obtained by using a larger block size. Gain(blocksize) is defined as C(blocksize/10)/C(blocksize) - 1 where C(x) means the compressed size of enwik9 with block size x. Compression times are fast modes for block sizes 10 through 108 and slow mode for 109 on a 2.2 GHz Athlon-64 with 2 GB memory under WinXP Home SP2.
Block enwik8 enwik9 Gain Comp ns/b ---- ---------- ----------- ---- ---- 101 66,414,034 646,449,572 4359 102 56,241,619 542,912,447 .191 2169 103 45,500,201 435,597,745 .246 1907 104 37,006,646 343,663,203 .267 1802 105 30,946,413 275,172,983 .249 1838 106 26,661,555 233,555,297 .178 2095 107 23,460,457 204,355,672 .142 2499 108 20,847,290 182,162,626 .122 3106 109 20,847,290 164,032,650 .110 4524
pcompress 3.1 is a free, open source (LGPLv3 and MPLv2) deduplicating archiver and file compressor by Moinak Ghosh. A Ubuntu build released Feb. 2, 2015 and updated Feb 6, 2015 was tested. The option "-c libbsc" means to compress a single file using libbsc (BWT). -l14 selects maximum compression (default -l6). -s1000m selects 1000 MB block size (default -s60m). The compression algorithm is deduplication followed by dictionary preprocessing and BWT.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- pcompress 3.1 -c libbsc -l14 -s1000m 20,769,968 163,391,884 1,370,611 x 164,762,495 359 74 3300 BWT 48
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- paq9a -9 19,974,112 165,193,368 13,749 s 165,207,117 3997 4021 1585 CM
uda 0.300 is a free, experimental
file compressor by dwing, July 16, 2006. It is a modification of PAQ8H with optimizations
for speed. It takes no options. The decompresser size is for uda.exe, since this is smaller
than the corresponding zip file.
BWTmix v1
(from here) is a free, open source, experimental
file compressor by Eugene Shelwien, June 28, 2009. It uses BWT (implemented using
quicksort) followed by an 8 model CM mixed using a tree of 2-input mixers.
The option c10000 selects a block size of 10000 * 100KB. The default block
size is 100 MiB. Memory usage is 5x block size.
.1678 BWTmix
Program Option enwik8 enwik9 Comp Deco Mem Note
--------- ------ ---------- ----------- ---- ---- ---- ----
bwtmix v1 c3334 20,608,793 170,596,616 3413 1253 1670 26
c10000 20,608,793 167,978,527 1793 690 5000 49
.1694 lrzip
lrzip 0.40 is a free, open
source file compressor by Con Kolivas, Nov. 26, 2009. It uses a range
dictionary preprocessor to remove long range redundancies (based on rzip),
followed by lzma (7zip) compression. It also has options to compress with
lzo (lzop) or bzip2 after preprocessing, or to output the preprocessed
data for compression with other programs. It runs under Linux.
lrzip 0.42 adds zpipe (zpaq cmid.cfg) as a back end compressor using option -z. It was tested in this mode.
lrzip 0.612
(discussion), Mar. 17, 2012, uses the current version of libzpaq (v5.01) for faster execution. The options select built in level 3 (max.cfg) compression.Program Options enwik8 enwik9 prog total Comp Deco Mem alg note ---------- ------------ ---------- ----------- -------- ----------- ---- ---- ---- ---- ---- lrzip 0.40 25,190,577 214,903,304 38,173 x 214,941,477 843 31 1700 LZ77 33 lrzip 0.42 -z 21,327,441 183,609,156 49,881 x 183,659,037 2173 2230 1800 CM 33 lrzip 0.612 -z -L 9 -p 1 19,847,690 169,318,794 99,363 x 169,418,157 2987 2929 2700 CM 33
cm4_ext was released Jan. 21, 2014. It is an order 10 CM with a match model and SSE.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- cm0 23,276,242 206,929,764 201,213 x 207,130,977 1731 1791 68 CM 26 cm0_ext 21,156,055 181,772,665 201,303 x 181,973,968 4206 4250 516 CM 26 cm1 28,092,863 243,631,412 202,038 x 243,833,450 391 226 211 CM 26 bwcm 23,265,333 204,416,216 202,803 x 204,619,019 1142 335 184 CM 26 bwcm c128 21,278,364 185,473,048 202,803 x 185,675,851 1525 407 1469 CM 26 cm4_ext 20,188,048 170,566,799 204,782 x 170,771,581 4123 4130 1906 CM 26
M1 0.2a is a free, open source (GPL) file compressor by Christopher Mattern, released Oct. 3, 2008. It uses context mixing with only two contexts. The contexts are 64 bits with some bits masked out. The masks and several other parameters were selected by a combination of a genetic and hill climbing algorithms running for several hours to 3 days to optimize compression on this benchmark as discussed here.
M1 0.3 was released Jan. 2, 2009.
M1 0.3b was released Apr. 12, 2009. This version takes a configuration file created by an optimization version of the program. The configuration file is required by the decompresser (and is included in the program size).
e8-m103b1-mh is a parameter file for M1 0.3b obtained by mhajicek after about 3 days of CPU time running M1's genetic optimization program on enwik8.
M1x2 v0.5-1 was released Dec. 8, 2009. The option 6 means to use 48 x 26 MB memory. The option enwik7.txt is an optimization file which resulted from tuning parameters on the first 10 MB of the benchmark by a separate optimization process. It must be specified during decompression. The file size (242 bytes) is included in the decompresser size. The program includes source code and compiled Windows and Linux versions. The Windows version was tested. The program is described as follows by the author:
M1x2 mixes two ordinary M1 models in the logistic domain (thus four models in total). Data is processed bitwise with a flat decomposition. Contexts are mapped to states, which represent bit histories encountered under the corresponding context. In this implementation contexts are restricted to byte masks with some tweaks for text; the context mapping is implemented using hash tables. Two bit history states s1, s2 are quantised Q(.,.) and mapped to a linear counter to produce a prediction p = P(y=1|Q(s1, s2)), where y is the next bit. Afterwards two predictions are transformed into the logistic domain and mixed linearily. The final prediction is: p = Sq[ (St(p2)-St(p1))*w + St(p1) ]; St(.) and Sq(.) name stretch and squash (see PAQ) There is just a single weight w in [0, 1]. The Predictions and the weight are updated to minimize coding cost. As in previous versions a genetic optimzier can tune all degrees of freedom to a training data set. Parameters include: contexts, state machine structure, counter and mixer settings.
m1x2 v0.6 (discussion), Feb. 8, 2010, preprocesses the input by pre-compressing it with an order-1 12 bit length limited Huffman code prior to compression with the context mixing model of v0.5-1. This improves speed by reducing the size of the input and improves compression because the context hash tables are not filled as quickly. The 7 option says to use 8 x 27 MB memory. The decompresser size includes the 242 byte configuration file enwik7.txt. The length limited Huffman codes are generated using an algorithm described by A. Turpin and A. Moffat in Practical Length-Limited Coding for Large Alphabets, The Computer Journal, 38, (5), 339-347, 1995.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----- M1 0.2a 24,656,008 219,115,069 25,336 s 219,140,405 452 447 33 CM 26 M1 0.3 24,004,989 215,101,056 24,596 s 215,125,652 395 404 33 CM 26 M1 0.3b text2.txt 23,506,215 209,057,165 23,150 s 209,080,315 377 403 33 CM 26 M1 0.3b text.txt 23,558,990 360 390 33 CM 26 M1 0.3b e8-m103b1-mh 23,456,037 207,931,967 23,150 s 207,955,117 383 412 33 CM 26 M1x2 v0.5-1 6 enwik7.txt 20,812,625 172,771,031 47,608 x 172,818,639 1019 1091 1576 CM 26 M1x2 v0.6 7 enwik7.txt 20,723,056 172,212,773 38,467 s 172,251,240 711 715 1051 CM 26
cmm1 is a free, open source (GPL) file compressor by Christopher Mattern, Sept. 18, 2007. It uses context mixing with LZP preprocessing.
cmm2 was released Dec. 10, 2007 without source code.
cmm2 080113 was released Jan. 13, 2008 without source code.
cmm3 080207 (test release) was released Feb. 7, 2008 without source code.
cmm4 v0.0 (test release) was released Mar. 14, 2008 without source code.
cmm4 v0.1e was released Apr. 20, 2008 without source code. It takes a 2 digit option "wm" (e.g. 96 meaning w=9, m=6). Memory usage is 2w MB for a sliding window, and 12*2m MB for a context mixing model (order 1,2,3,4,6). On my machine m=7 caused disk thrashing.
Description by the author: CMM4 0.1e Is a variable order context mixing coder, it predicts using the four "highest" (ranking: 643210) models in each bit coding step and, in addition, the match model input. Orders 0 and 1 are implemented using a table lookup, all higher orders use nibble based hashing. Matches are found using order 4 and 6 LZP, the pointers and a quick exclusion hash are stored within the model's hashing tables. The mixer joins the 4 (or 5 in presence of a match model) predictions and outputs them to a SSE stage. A mixer (similar to (L)PAQ) is selected based on the last byte's 4 MSBs and on the coding order. The SSE context is made of an order 0 context and qunatized combination of the previous symbol rank, the match length and partially matched symbol. This results in a notable compression increase on redundant data. The model's counters are quantized using the PAQ's state machine since CMM4 (will be replaced). Despite the use of hashing most data structures are tuned to never cross a cache line per nibble (the models) or octet (the mixer) (only SSE does). The core compression performance is equivalent to LPAQ1/2, while being faster. In addition there's a filter framework, which currently implements an x86 transform and will be extended.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Opt enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg ------- --- ---------- ----------- ----------- ----------- ----- ----- --- --- cmm1 23,495,627 207,266,867 18,785 x 207,285,652 1165 1198 50 CM cmm2 23,477,008 208,268,161 17,901 x 208,286,062 1756 1849 32 CM cmm2 080113 22,303,128 191,477,052 18,263 x 191,495,315 2180 2127 329 CM cmm3 080207 21,212,766 179,633,451 18,700 x 179,652,151 2328 ~2609 395 CM cmm4 v0.0 21,459,665 186,395,591 18,042 x 186,413,633 1807 1849 116 CM cmm4 v0.1e 96 20,569,034 172,669,955 31,314 x 172,701,269 2052 2056 1321 CM cmm4 v0.2b 87 20,550,129 171,969,035 1803 CM 42
lstm-compress is a free, experimental open source file compressor by Byron Knoll, June 15, 2017. It takes no options. It uses the LSTM neural network model and dictionary preprocessor from CMIX but omits the other models.
A new version v2 of lstm-compress was released Dec. 12, 2017.
v3 was released Mar. 30, 2019.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- lstm-compress v1 20,488,816 175,708,405 154,379 s 175,862,784 433968 433783 10 LSTM 66 lstm-compress v2 20,494,577 174,868,709 157,238 s 175,025,947 114764 114908 9 LSTM 83 lstm-compress v3 20,318,653 173,874,407 144,567 s 174,018,974 92342 91876 9 LSTM 83
ccm 1.1.1a (Feb. 23, 2007) has only one version.
ccm 1.1.2a (Mar. 2, 2007) includes a ccm_low version using less memory, which was not tested.
ccm 1.20a (Mar. 21, 2007) has only one version.
ccm 1.20d (Apr. 8, 2007) has two versions: ccm using 99MB memory and ccmx using 210 MB for better compression. Only ccmx was tested.
ccm 1.21 (mirror) (Apr. 22, 2007) includes an option to select memory usage. 7 selects maximum memory, 1300 MB. Only the high compression version (ccmx) was tested.
ccm 1.30 (mirror) was released Jan. 7, 2008. Only ccmx 7 (high compression version, maximum memory) was tested.
Compression Compressed size Decompresser Total size Time (ns/byte) Program enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ccm 1.0.3a 27,667,346 240,296,736 7,217 x 240,303,953 676 679 17 CM ccm_high 1.0.3a 25,412,726 221,177,776 7,229 x 221,185,005 1119 1171 17 CM ccm_extra 1.0.3a 24,027,805 207,273,926 7,230 x 207,281,156 1341 1353 100 CM ccm 1.1.1a 22,824,629 197,271,467 9,019 x 197,280,486 1247 1252 82 CM ccm 1.1.2a 22,675,768 195,965,427 8,502 x 195,973,929 1161 1183 83 CM ccm 1.20a 21,350,295 182,784,655 13,346 x 182,798,001 1794 1801 210 CM ccmx 1.20d 21,310,303 182,379,461 13,468 x 182,392,929 1383 1485 210 CM ccmx 7 1.21 20,819,656 174,161,536 21,139 x 174,182,675 1521 1493 1324 CM ccmx 7 1.30 20,857,925 174,142,092 15,014 x 174,157,106 1313 1338 1332 CM
bit 0.1is a free, closed source file compressor by Osman Turan, Dec. 19, 2007. It uses ROLZ optimized for binary files. It takes no options.
bit 0.2b is an archiver, released June 14, 2008. Option -m lwcm selects the compression type (lightweight context mixint). This is the only type supported. Option -mem 9 selects maximum memory. This option ranges from 0 to 9 and uses 3 + 2opt MB memory. The program uses order 1, 2, 3, 4, and 6 context mixing with 2 SSE stages as discussed here. Comments by author:
LWCX (Light-Weight Context Mixing) is a codec of BIT Archiver. It's designed for getting high compression ratio with acceptable speed (Not enough fast currently). LWCX is a bit-wise context mixing schema which tries to mix order-n models (order 012346). The statistics are gathered by the counters which predict next bit by semi-stationary update rule. After gathering the predictions from all models, a neural network (similar to PAQ's neural network) tries to output a new mixed prediction. The mixed prediction is processed by a 2D SSE stage which have 32 vertices. Finally, a carryless arithmetic coder codes the given bit with final prediction.
Most of data structures are designed for avoiding cache misses. Order-0 and order-1 models' statistics stored in a direct lookup table. Higher orders (order 2346) models' statistics stored in a large hash table. Hash table size can be selected by "-mem N" option (memory usage is 3+2^(N+1) MB, N ranges 0 to 9). The codec locates a hash entry per only coding nibble.
bit 0.7 has options -p=1 through -p=5 to select memory usage of 10 + 20*2p MB.
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg Note --------- --- --------- ----------- ------- ----------- ---- ---- --- ---- ---- bit 0.1 31,186,930 271,705,328 35,400 x 271,740,728 535 83 35 ROLZ bit 0.2b -m lwcm -mem 9 21,971,587 189,881,180 63,665 x 189,944,845 2708 2747 1052 CM bit 0.7 -p=5 20,823,204 174,425,039 62,493 x 174,487,532 2050 2100 663 CM 26
mcomp x32 v2.00 is a free, closed source, command line file compressor by Malcolm Taylor (author of WinRK), released Aug. 23, 2008. It uses a large number of algorithms, although not the same ones as WinRK. There is a 32 bit version (mcomp_x32.exe) and a 64 bit version (mcomp_x64.exe) for Windows. Only the 32 bit version was tested (in 32-bit Vista). It displays the following help message:
LibMComp Demo Compressor (v2.00). Copyright (c) 2008 M Software Ltd. mcomp [options] pofile(s) Options: -m[..] Compression method: b - BZIP2. c - Experimental DMC codec. d - Optimised deflate (df - fast, dx - max) d64 - Optimised deflate64 (d64f - fast, d64x - max) lz - Optimised LZ (lzf - fast, lzx - max) f - Optimised ROLZ (ff - fast, fx - max) f3 - Optimised ROLZ3 (f3f - fast, f3x - max) p - PPMd var.J. sl - Bitstream (LSB first). sm - Bitstream (MSB first). w - Experimental BWT codec. -MNN[k,m] Model size (in kb (default) or Mb, default 64M). -oNN Order (for Bitstream and PPMd). -np Display no progress information.
pofile(s) means input file and output file. When run with no compression options, the program decompresses. Test results are as follows on a dual core 2 GHz Pentium T3200 with 3 GB as in note 26.
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg Note --------- --- --------- ----------- ------- ----------- ---- ---- --- ---- ---- mcomp_x32 -mb 29,997,076 2070 970 4 BWT -M has no effect -mc 23,546,185 1350 1410 50 DMC -mc -M512m 22,561,089 1520 322 DMC max memory -mdf fails -md 35,436,114 2140 1421 4 LZ77 fails -mdx 35,383,881 2240 1420 4 LZ77 fails -md64f fails -md64x 32,983,178 28930 1310 4 LZ77 fails -mlz 24,648,445 3090 50 595 LZ77 -mf 24,331,132 2240 78 149 ROLZ -mf -M1800m 23,187,091 3320 77 414 ROLZ -mfx -M1800m 23,182,541 3410 81 414 ROLZ -mf3x -M1800m 23,098,116 3850 112 415 ROLZ -mp -M1800m -o10 21,039,213 177,948,781 172,531 x 178,121,312 4580 12180 1847 PPM -mp -M1800m -o12 20,917,657 179,193,238 172,531 x 179,365,769 5180 1847 PPM -mp -M1800m -o16 20,868,127 181,150,814 172,531 x 181,323,345 5750 1847 PPM -msl -M1800m -o12 54,428,147 6510 6480 1 CM? -M has no effect -msm 59,731,673 5880 5810 1 CM? -M has no effect -mw 21,805,857 188,095,082 172,531 x 188,267,613 356 232 660 BWT 2 cores -mw -M180m 21,103,670 179,838,392 172,531 x 180,010,923 329 284 1850 BWT 2 cores -mw -M320m 21,103,670 174,388,351 172,531 x 174,560,882 473 399 1643 BWT 1 core
-mb produces bzip2 compatible format. -M has no effect. Memory usage is fixed at 4 MB.
-mc uses DMC. If memory is greater than -M512, then the program aborts with an assertion failed.
-md and -md64 are supposed to generate deflate and deflate64 formats (zip or gzip). However -mdf and -md64f (fast modes) crash immediately during compression. The other modes decompress to files that are the correct size but not identical to the original. Run times are very slow due to most of the CPU time spent in the kernel (up to 90%) as reported by timer 3.01.
-mp used PPMD var. J, but allows more memory (up to about 1800 MB). The original program was limited to 256 MB. The optimal orders are different for enwik8 and enwik9. Higher orders help compression, but lower orders save memory on larger files. The maximum order is -o16. Higher values have no effect. Decompression is slow due to 55% of the CPU time spent in the kernel. Normally this is around 1% and decompression speed would be the same as compression.
-msl and -msm ignore the -M option and use 1 MB memory, resulting in poor compression.
-mw (experimental BWT) is the only option that uses both cores. All others result
in 50% CPU usage on a 2 core processor. The -M option actually
selects the block size, not total memory usage. Memory usage is 5x block size if one core is used,
or 10x if both are used. Both are used only if enough memory is available. The default is to
split the file in half and compress the two halves in parallel. However, better but slower compression
can be obtained by using -M to select one block for the whole file. Maximum memory is 2 GB, even
if more is available. For enwik9, -M320 selects 3 blocks, which are compressed in series on one core.
For two cores, time reported is wall time.
Process time for -mw -M320m is 187% of wall time for compression and 139% for decompression.
epmopt + epm r9 is an experimental,
closed source
command line optimizer and file compressor by Serge Osnach, Oct. 16, 2003. It was
intended for enc r16, but development on that project has stopped at enc r15, according
to the web page (in Russian). The program has two parts: epm, a
PPM compressor with text preprocessing, and epmopt, which attempts to optimize
the parameters to epm by compressing repeatedly and varying the options one at a
time until there is no more improvement. The input to epmopt may be different
than epm, and supports optimization on sets of files matching patterns in
specified sets of directories. The options to epm are memory limit, PPM order,
and 20 undocumented options each specified by a single digit. The exact same options
must be passed to the decompresser. In the results, I added 27 bytes to the
compressed file sizes to account for this information. enwik9 was compressed
and decompressed as follows:
Warning: epm failed to decompress correctly on enwik7 (first
107 bytes). In the output, some linefeeds were changed
to spaces. This happened with all parameter combinations I
tested including defaults: epm c enwik7 enwik7.epm.
Decompression was bit-exact for enwik5, enwik6, enwik8 and enwik9.
WinUDA 0.291 is a
free, closed source GUI
archiver by dwing, July 4, 2005. It uses context mixing and is
derived from paq6. Mode 3 is the slowest (about 3x slower than
mode 0) and uses the most memory, 194 MB.
dark v0.51 is a free, closed source
archiver by Malyshev Dmitry Alexandrovich, Jan. 2, 2007. It uses BWT + distance coding without preprocessors.
The -b333m option selects 333 MB
blocks. -f (-f0 in 0.40 and 0.46, not supported in 0.32) forces no segmentation.
Memory usage is 5 times the block size for compression
(6x prior to v0.46).
opendark ver. A
is an open source version of dark. The supplied Windows dark.exe
crashed when decompressing enwik9 (size is 177,675,818).
Decompression works up to -b127m. opendark does not support the -f option.
FreeArc 0.36 is a free, open source archiver
by Bulat Ziganshin, Feb. 21, 2007. It incorporates 7 compression libraries - PPMd,
GRZipII, LZMA (7zip), plus BCJ (7zip), REP (rzip-like), dynamic dictionary and LZP
preprocessors. The option -m9 selects maximum compression (dict + LZP + PPMd for text
files, REP+LZMA for binary). -lc1600000000 limits
memory to 1.6 GB (same as -lc1600m). There is an option to use ppmonstr as an external
compressor, which was not included in the test.
FreeArc 4.0 pre-4 ppmd generally gives the best compression for text. It will also call ppmonstr
as an external program, but this mode was not tested, even though it compresses better.
For this test, the Windows command line version was tested. The option
-mppmd:1012m:o13:r1 is equivalent to ppmd -m1012 -o13 -r1, selecting 1012 MB memory,
order 13, and partial reinitialization of the model when memory is exhausted.
Note that ppmd normally allows only up to -m256. This program was tested with 2 GB
memory but values higher than -m1012 caused the program to crash during compression.
FreeArc
0.666 was released May 19, 2010. The 32 bit Windows console version was
tested. -m9 selects maximum compression. There are many other compression options
but these were not tested.
freearc 0.67a was released Mar. 15, 2014. Options -m1 to -m9 select the compression
level from fastest to best. -m1x to -m9x select levels with fast decompression.
Decompression was tested with the separate unarc.exe program.
.1749 epmopt | epm
epmopt -m800 -n20 --fixedorder:12 enwik6 .
epm c01286014321245957352513 enwik9 enwik9.epm -m800
epm d01286014321245957352513 enwik9.epm enwik9.tmp -m800
The optimization data was enwik6, the first 106 bytes
of the input file. epmopt compressed this about 100 times in
368 seconds with different options, making 35 passes through
the list of 20 undocumented parameters, adjusting each one up
or down one at a time. The fixed parameters
were -m800 (800 MB memory limit) and PPM order 12 (--fixedorder:12,
also the first 3 digits of the parameter string. Allowing epmopt
to set the PPM order on a smaller training file will cause it to
choose too large a value, hurting compression. I only tested
orders 10, 12, and 20 on enwik8 and 12 gave the best compression).
The -n20 option tells epm to tune all 20 parameters. The parameter
string is written to the file enc.ini. The -m800 option need
not be the same for epmopt and epm but must be the same
for epm during compression and decompression.
.1749 WinUDA
.1755 dark
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
dark 0.32b July 9, 2006 -b128m 21,414,479 185,844,554 31,076 x 185,875,590 481 407 790 BWT
dark 0.40b Aug. 14, 2006 -b128mf0 21,243,259 184,271,115 34,688 x 184,305,803 471 316 790 BWT
dark 0.46 Aug. 23, 2006 -b160mf0 21,231,325 181,904,374 40,780 x 181,945,154 488 404 813 BWT
-b333mf0 21,231,325 175,955,412 40,780 x 175,996,192 432 425 1692 BWT
opendark A Nov. 14, 2006 -b333m 21,432,727 (fails) 10,089 s 450 390 1692 BWT
-b127m 21,432,727 185,985,101 10,089 s 185,995,190 389 331 652 BWT 26
dark 0.51 Jan. 2, 2007 -b333mf 21,169,819 175,471,417 34,797 x 175,506,214 533 453 1692 BWT
.1760 FreeArc
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
FreeArc 0.36 -m9 -lc1600000000 21,153,231 184,498,111 372,457 s 184,870,568 665 517 1600 PPM
FreeArc 0.40 pre-4 -mppmd:1012m:o13:r1 20,931,605 175,254,732 748,202 x 176,002,934 1175 1216 1046 PPM
FreeArc 0.666 -m9 21,659,587 189,696,374 1,214,530 x 190,910,004 524 416 785 PPM 26
FreeArc 0.67a -m1 39,485,049 25 27 191 26
-m2 26,831,928 59 121 117 26
-m3 25,221,359 147 100 157 26
-m4 24,285,483 174 132 155 26
-m5 23,020,671 410 443 311 26
-m6 21,659,587 570 471 463 26
-m7 21,659,587 592 477 463 26
-m8 21,659,587 604 495 448 26
-m9 21,659,587 189,696,374 148,665 xd 189,845,039 519 420 813 26
-m1x 39,485,049 27 25 194 26
-m2x 34,307,417 73 28 170 26
-m3x 27,336,122 269 32 186 26
-m4x 25,652,947 357 45 189 26
-m5x 24,897,495 564 43 204 26
-m6x 23,870,179 522 41 453 26
-m7x 23,788,636 546 41 599 26
-m8x 23,788,633 565 41 584 26
-m9x 23,788,633 567 41 584 26
.1766 hook
hook v0.2 is a free,
open source (GPL) command line file
compressor by Nania Francesco Antonio, Jan. 8, 2007. It uses DMC: a state machine
in which each state represents a bitwise context. Each state has 2 outgoing
transitions corresponding to next bits 0 and 1, and a count n0 or n1 associated
with each transition. Bit y (0 or 1) is compressed by arithmetic coding with probability
ny/(n0+n1) (where ny is n0 or n1 according to y), and then ny is incremented.
After each input bit, the next state represents a context obtained by appending that bit on the right and possibly dropping bits on the left. States are cloned (copied) whenever the incoming and outgoing counts exceed certain limits. This has the effect of creating a new context in which no bits are dropped. In the example below, the state representing context 110 (dropping 2 bits from the previous context) is cloned by creating a new state 11110 because the incoming 0 transition count (ny for y=0) from state 1111 exceeded a limit. The new context is longer because it does not drop any bits. This transition is moved to point to the new state. Other incoming transitions (not shown) remain pointing to the original state. The outgoing transitions are copied. The counts of the original state are distributed to the new state in proportion to the moved transition's contribution to those counts, which is w = ny/(n0+n1).
n0 ----> 1100 n0*(1-w) ----> 1100 ny / / / 1111 -----> 110 1111 110 / (y=0) \ | \ / n1 ----> 1101 | n1*(1-w) ----> 1101 | / / | n0*w / / | ny / / +----> 11110 / \ / n1*w -- Before cloning After cloning 110 to 11110
Normally, the initial set of contexts begin on byte boundaries. The cloning mechanism ensures that new contexts also have this property.
In hook v0.2, the counts are 32 bit floating point numbers initialized to 0.1. The initial state machine has 256*255 states representing bytewise order 1 contexts with uniform statistics. When memory is exhausted, the model is discarded and the state machine is reinitialized. A new state is cloned when ny > limit and n0+n1-ny > length, where limit and length are parameters. The optimal parameters for enwik8 and enwik9 are "c 7 2 6", c means compress, 7 selects the maximum of 1 GB memory (64M states at 16 bytes each, minimum is 8 MB memory), 2 is the limit (range 1 to 7), and 6 selects a length of 32 (possible values are 1, 2, 3, 4, 8, 16, 32, 64). Larger lengths are better for large files because they conserve memory at the expense of compression.
hook v0.3 (Jan. 11, 2007) allows up to 1.8 GB memory (first option = 9) and uses double precision predictions in the 32 bit arithmetic coder.
hook v0.3a (Jan. 12, 2007) initializes the counts to 0.125 (instead of 0.1) and uses 24 bit precision in the arithmetic coder (instead of 32 bit).
hook v0.4 (Jan. 15, 2007) initializes counts to 0.1. Argument 2 selects length 3 (not 2).
hook v0.5b (Jan. 22, 2007) adds an LZP preprocessor. If the next byte to be coded is the same as the byte that occurred in the last matching 3 byte context, then this is indicated by coding a flag bit in an order 3 model (32 MB memory), and a match length coded by DMC with a fixed size of 128 MB. If there is no match, then the literal byte is coded by another variable sized DMC model. The parameters "c 1600000000 2 64 1 6" select compression (c), 1.6 GB for the DMC literal model (1600000000), a limit of 2 (minimum count for the cloned state), length of 64 (minimum remaining count for the state to be cloned), LZP selected (1), and a minimum match length of 6.
hook v0.6 (Feb. 7, 2007) removes the "length" parameter (effectively infinite). The arguments "c 1600 4 1 6" mean to compress (c), use 1600 MB memory, set the "limit" parameter to 4, turn on LZP preprocessing (1) with a minimum match length of 6. The "limit" parameter is the minimum count for an outbound DMC state transition to clone the state. Limit was tuned on enwik8.
hook v0.6b (Feb. 8, 2007) includes support for files up to 264 bytes (compiled by Ilia Muraviev. Earlier versions were compiled with MinGW g++ 3.4.5 by Matt Mahoney.) "limit" was tuned on both enwik8 and enwik9. Higher values conserve memory at the expense of compression on smaller files.
hook v0.6c (Feb. 14, 2007) stores the input filename in the compressed file and uses it during decompression.
hook v0.7 (Mar. 10, 2007) uses 325 MB more memory than advertised so it was tested with a lower option.
hook v0.7b (Mar. 12, 2007) reduces the excess memory to 94 MB.
hook v0.8 was released Mar. 17, 2007. Some additional results on enwik9 decreasing the rate at which the state machine fills up and is flushed:
hook08 params enwik9 ------------ ----------- c 1700 1 1 6 183,175,857 c 1700 2 1 6 181,578,888 c 1700 3 1 6 181,220,553 c 1700 4 1 6 181,268,867 c 1700 5 1 6 181,197,310 c 1700 6 1 6 181,567,697 c 1700 7 1 6 181,813,763 c 1700 8 1 6 182,360,391
hook v0.8b (Mar. 18, 2007) has some LZP improvements.
hook v0.8c (Mar. 19, 2007) is a minor bug fix. Compressed sizes are 1 byte larger than v0.8b.
hook v0.8d was released Mar. 21, 2007.
hook v0.8e was released Mar. 27, 2007.
hook v0.9 (Apr. 6, 2007) is closed source. It requires a processor that supports SSE instructions. It has some speed improvements and a E8/E9 filter for improved compression of .exe files. Memory usage is the second argument + 60MB.
freehook 0.2 is an open source port of hook v0.8e from C++ to C by Eugene Ortmann, Apr. 7, 2007. The supplied .exe file requires SSE instructions (Pentium 3 or higher), but the source can be recompiled for other processors.
hook v0.9b (Apr 10, 2007) replaces floating point arithmetic with integer arithmetic, so that archives are compatible across different processors. Note: I reduced the memory setting from 1800 to 1700 to prevent disk thrashing, which was a problem in earlier tests. I will do this from now on. This hurts enwik9 compression (but not enwik8) slightly, from 180,444,546 to 180,582,601. Actual memory usage is 60 MB over.
freehook 0.3 (Apr 10, 2007) has only very minor changes from 0.2 but is slightly faster due to different g++ compiler options. Compression is the same as 0.2. Memory usage is about 160 MB over.
hook v0.9c (May 8, 2007) has some speed improvements in the arithmetic coder. It compresses the same size as v0.9b.
hook v1.0 (Sept. 20, 2007) is closed source. The only option is memory size in MB.
The zip file linked above contains all versions (C++ source and Win32 .exe).
hook 1.1 (Nov. 13, 2007) improves BMP and WAV compression.
hook 1.3 was released Dec. 14, 2007, modified Dec. 15, 2007.
hook 1.4 was released Apr. 29, 2009.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- hook v0.2 c 7 2 6 23,628,061 208,211,084 2,556 s 208,213,640 772 779 1052 DMC hook v0.3 c 9 2 6 23,548,017 202,024,740 3,567 s 202,028,307 849 864 1764 DMC hook v0.3a c 9 2 6 23,499,700 201,934,976 3,555 s 201,938,531 862 832 1764 DMC hook v0.4 c 9 2 6 23,349,695 199,829,234 4,112 s 199,833,346 934 959 1764 DMC hook v0.5b c 1600000000 2 64 1 6 22,806,402 193,227,085 5,113 s 193,232,198 1084 1029 1764 LZP+DMC hook v0.6 c 1600 4 1 6 22,472,884 191,733,561 5,112 s 191,738,673 1146 1034 1600 LZP+DMC hook v0.6b c 1600 4 1 6 22,535,069 189,932,778 5,174 s 189,937,952 1040 1600 LZP+DMC c 1600 6 1 6 22,776,927 188,384,238 5,174 s 188,389,412 1090 1026 1600 hook v0.6c c 1600 6 1 6 22,561,621 188,081,694 5,878 s 188,087,572 1131 1092 1600 LZP+DMC hook v0.7 c 1000 6 1 6 22,410,669 191,516,313 6,195 s 191,522,508 1360 1353 1375 LZP+DMC hook v0.7b c 1700 6 1 6 22,404,817 184,765,030 6,195 s 184,771,225 1516 1655 1794 LZP+DMC hook v0.8 c 1700 5 1 6 22,290,033 181,197,310 6,686 s 181,203,996 1110 1118 1700 LZP+DMC hook v0.8b c 1700 5 1 6 22,399,354 180,335,788 6,944 s 180,342,732 988 1033 1700 LZP+DMC hook v0.8c c 1700 5 1 6 22,399,355 180,335,789 7,071 s 180,342,860 1043 1005 1700 LZP+DMC hook v0.8d c 1700 5 1 6 22,399,027 180,319,203 7,037 s 180,326,240 928 915 1700 LZP+DMC hook v0.8e c 1700 3 1 6 22,039,935 178,140,788 7,263 s 178,148,051 952 1009 1700 LZP+DMC hook v0.9 c 1800 2 1 6 21,969,342 178,932,435 10,069 x 178,942,435 869 1860 LZP+DMC c 1800 3 1 6 22,077,883 178,599,478 10,069 x 178,609,547 833 916 1860 LZP+DMC freehook 0.2 c 1700 3 1 6 22,039,914 178,141,036 7,386 s 178,148,422 813 855 1860 LZP+DMC hook v0.9b c 1700 3 1 6 22,496,910 180,582,601 9,278 x 180,591,879 810 810 1721 LZP+DMC freehook 0.3 c 1600 3 1 6 22,039,914 178,619,149 7,352 s 178,626,501 789 818 1713 LZP+DMC hook v0.9c c 1700 3 1 6 22,496,910 180,582,601 8,506 x 180,591,107 774 791 1721 LZP+DMC hook v1.0 c 1700 22,122,484 177,843,658 11,163 x 177,854,821 865 879 1739 LZP+DMC hook v1.1 c 1700 22,122,484 177,843,658 25,854 x 177,869,512 877 872 1739 LZP+DMC hook v1.3 c 1700 22,030,108 178,216,980 13,870 x 178,230,850 825 835 1736 LZP+DMC hook v1.4 c 1700 21,990,502 176,648,663 37,004 x 176,685,667 741 695 1777 LZP+DMC
7zip 4.42 is an open source GUI and command line archiver by Igor Pavlov, May 14, 2006. It compresses to 7z, zip, gzip, ppmd.H and tar format, optionally encrypts with AES, and will uncompress several other formats.
7z is the default format. It uses LZMA compression, a variation of LZ77. The option -mx=9 selects ultra (maximum) compression in this mode. The option -sfx7zCon.sfx creates a console-based self extracting executable by prepending a 131,584 byte decompresser. This is slightly smaller than the Windows GUI version (132,096 bytes) and much smaller than the decompression program itself as a zipped self extracting download (817,795 bytes). The best compression is with ppmd. The options are -m0=ppmd:mem=768m:o=10 equivalent to ppmd var H (with minor changes) order 10 with 768 MB memory. 7zip 4.46a was announced May 21, 2007. (The improved compression is due to testing with more memory).
7zip 9.04a was released Dec. 3, 2009. It gave an out of memory error with mem=1630.
7zip 9.20 was released Nov. 18, 2010. Default (LZMA) mode was tested. It uses 196 MB for compression using 75% of 2 cores, and 18 MB for decompression on a 2.0 GHz T3200 under Windows.
The following include the best known option combinations for 7zip on enwik8 in ppmd (PPM), 7z (LZMA), bzip2 (BWT) and zip (LZ77) formats.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Alg Notes ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ----- 7zip 4.42 -m0=ppmd:mem=768:o=10 -sfx7xCon.sfx 21,375,060 185,043,783 0 xd 185,043,783 505 ~500 PPM 7zip 4.42 -m0=ppmd:mem=293m:o=7 21,791,628 647 655 PPM 6 7zip 4.42 -mx=9 -sfx7zCon.sfx 24,996,113 213,490,979 0 xd 213,490,979 2286 63 LZMA 7zip 4.42 -tbzip2 -mpass=2 29,003,844 1974 176 BWT 6 7zip 4.42 -tzip -mm=deflate64 -mfb=153 -mpass=8 33,727,442 2803 28 LZ77 6 7zip 4.42 -tzip -mm=deflate -mfb=171 -mpass=8 35,056,389 2672 27 LZ77 6 7zip 4.42 -tzip -mm=deflate -mfb=258 -mpass=8 35,057,040 2664 29 LZ77 6 7zip 4.42 Zip/Ultra (in GUI) 35,057,347 4307 LZ77 1 7zip 4.46a -m0=ppmd:mem=1630m:o=10 -sfx7xCon.sfx 21,197,559 178,965,454 0 xd 178,965,454 503 546 PPM 7zip 9.04a -m0=ppmd:mem=1500m:o=10 -sfx7zCon.sfx 21,211,895 179,209,403 0 xd 179,209,403 506 520 PPM 26 7zip 9.12b -m0=ppmd:mem=2048m:o=10 21,060,863 177,187,967 PPM 42 7zip 9.20 25,895,909 227,905,645 518,536 x 228,424,181 1031 42 LZMA 26
rings 0.1 is a free, closed source, experimental file compressor by Nania Francesco Antonio, Sept. 21, 2007. It uses LZP with order-2 coding of literals and arithmetic coding. It takes no command line options.
rings 0.2 (Nov. 16, 2007) includes improved BMP, WAV, TIFF, and PGM filters.
rings 0.3 was released Dec. 21, 2007.
rings 1.0 was released Feb. 8, 2008. It uses 50 MB for compression and 43 MB for decompression.
rings 1.1 was released Feb. 13, 2008 with same memory usage. It uses CM with LZP preprocessing for faster compression.
rings 1.2 was released Mar. 4, 2008 with the same memory usage.
rings 1.3 was released Apr. 2, 2008. It uses 54 MB for compression and 47 MB for decompression.
rings 1.4c was released Apr. 14, 2008. It has an option (1-9) which selects memory usage. Each increment doubles usage. Memory usage and run time are greater for decompression than compression. For option 9, compression uses 526 MB and decompression uses 789 MB. The program uses BWT. The transformed data is encoded using MTF (move to front), pre-Huffman coding followed by arithmetic coding.
rings 1.5 was released Apr. 21, 2008. It improves compression and is symmetric with regard to memory usage. Options are like 1.4c. The table below compares timing results on my old and new computers.
rings 1.6 was released Aug. 16, 2009. The option ranges from 1 to 10, where 10 uses the most memory. It includes a Linux version (18,348 bytes zipped) which was not tested.
rings 2.0 (discussion) is a multi-threaded archiver rather than a file compressor. It uses BWT. It has an interface similar to zcm. Option -m7 selects maximum block size of 100 MB using 500 MB memory per thread. Option -t1 or -t2 selects 1 or 2 threads. On a 2 core machine, selecting 2 threads shows 3 processes in Windows Task Manager, two of which use 500 MB memory and I/O dividing the input and output files, and one process using 7 MB with several GB of input and a lot of kernel CPU time. These 3 processes must share 2 cores. As a result, it runs slower than 1 thread.
rings 2.1 (discussion) was released May 23, 2015.
rings 2.2 was released May 28, 2015. -o option enables multi-threaded compression.
rings 2.5 was released June 6, 2015. Option -o was removed. The 64 bit version was tested.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- rings 0.1 35,693,969 314,161,660 11,271 x 314,172,931 187 179 16 LZP rings 0.2 35,693,969 314,161,660 25,832 x 314,187,492 192 167 16 LZP rings 0.3 35,151,555 309,179,126 32,132 x 309,211,258 188 154 16 LZP rings 1.0 26,384,013 235,897,616 25,585 x 235,923,201 221 321 50 CM rings 1.1 26,793,247 238,353,988 27,513 x 238,381,501 151 255 50 CM rings 1.2 25,873,235 229,695,548 30,484 x 229,726,032 120 175 50 CM rings 1.3 25,873,235 229,695,548 43,329 x 229,738,877 104 163 54 CM rings 1.4c 9 24,591,826 217,427,384 39,149 x 217,466,533 103 287 789 BWT rings 1.5 9 21,848,093 191,067,972 44,565 x 191,112,537 172 189 426 BWT rings 1.5 9 21,848,093 191,067,972 44,565 x 191,112,537 144 188 425 BWT 26 rings 1.6 10 21,918,217 189,242,552 47,618 x 189,290,170 165 192 795 BWT 26 rings 2.0 -m7 -t2 21,195,013 185,258,194 164,995 x 185,423,189 398 223 986 BWT 26 rings 2.0 -m7 -t1 21,194,965 185,256,848 164,995 x 185,421,843 375 206 493 BWT 26 rings 2.1 -m7 -t1 20,967,373 183,891,457 230,702 x 184,122,159 195 188 1859 BWT 48 rings 2.2 -m7 -o 20,938,029 183,531,002 341,445 x 183,872,447 202 179 1859 BWT 48 rings 2.5 -m8 -t1 20,873,959 178,747,360 240,523 x 178,987,883 280 163 2518 BWT 48
pimple 1.43 beta is a free, closed source GUI archiver by Ilia Muraviev, Apr. 24, 2006. It uses context mixing.
pimple2 is a command line file compressor, June 11, 2007.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- pimple 1.43 beta 512MB, order 8, match 32 20,992,830 181,998,817 353,472 x 182,352,259 9638 10112 512 CM 3 pimple2 (none) 20,871,457 180,251,530 78,642 x 180,330,172 18474 17992 128 CM
ash 04a is a free, experimental command line file compressor by Eugene D. Shelwien, Dec. 5, 2003. The /m700 option selects 700 MB memory limit. (/m800 causes disk thrashing with 1 GB). /o10 selects model order 9. This gives good results on smaller files when memory is constrained, but I did not try to optimize it. There is a /s option to select SSE depth that gives good results for the default value of /s5 so I did not try to optimize it either. Other results:
ash04a options enwik9 Comp (ns/byte) ---------- ----------- ---- /m700 /o8 (order 7) 180,830,523 5883 /m700 /o10 (order 9) 180,735,542 6011Note: the acutal memory usage (commit charge) for enwik9 /m700 /o8 was 1910 MB at the end of compression, minus 257 MB for other programs, according to Windows task manager. This is generally not a problem if your swap file is large enough. It appears to be a slow memory leak (recovered when program exits) and does not cause thrashing.
ash /m1700 /o10 and /o12 failed to compress enwik9 with 2 GB memory
(error: could not allocate a block).
enwik8 compressed to 19,713,239 using /o10 and
19,446,859 using /o12.
.1807 bce3
bce3 is a free, open source (Apache), experimental file compressor
by Christoph Diegelmann, Mar. 16, 2015. It uses an order-n bitwise context model
where the model is computed using BWT and encoded and transmitted to the
decoder. Memory usage is 5 times the file size. The program takes no
options. I tested by compiling with g++ 4.8.3 in Ubuntu Linux.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- bce3 22,729,148 180,732,702 19,889 s 180,752,591 1151 2444 5000 CM 71 22,729,148 1230 2020 500 CM 48
ocamyd LTCB 1.0 is a modification by Mauro Vezzosi on June 20, 2006 of Frank Schwellinger's ocamyd-1.65-final. The option -s0 selects maximum compression. -m3 selects 300 MB memory (the maximum for the test machine), but it supports up to -m8.
ocamyd 1.66.final, by Frank Schwellinger, Feb. 1, 2007, includes the -f option to prevent flushing and rebuilding the DMC model when memory is exhausted.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- ocamyd 1.65.final -s0 -m8 21,456,536 185,727,437 20,618 x 185,748,055 50782 50935 800 DMC ocamyd LTCB 1.0 -s0 -m3 21,285,121 182,359,986 21,030 x 182,381,016 108960~110000 300 DMC 6 ocamyd 1.66.final -s0 -m3 -f 21,123,280 182,410,035 20,636 x 182,430,561 59130 59637 300 DMC 6
The following table shows the effect of the -s and -m options on ocamyd 1.65.final on enwik8. Times are in ns/byte, process (kernel+user) time by timer 3.01, ~ indicates global (wall) time.
Options enwik8 Comp Decomp Notes ------- ---------- ----- ----- ----- -s0 -m8 21,456,536 42030 42010 -s0 -m4 22,073,527 70482 70538 6 (400 MB) (~101015 ~92921 global time) -s1 -m4 23,944,647 ~33535 6 -s2 -m4 26,345,297 ~1940 6 -s3 -m4 28,060,900 ~1826 6 -s0 -m3 22,296,826 ~70960 6 (300 MB) -s1 -m3 24,114,574 ~33818 6 -s2 -m3 26,911,154 ~1603 6 -s3 -m3 28,278,662 ~1514 6 -s0 -m2 22,688,950 ~70172 6 (200 MB) -s1 -m2 24,511,065 ~33771 6 -s2 -m2 27,614,083 ~1562 6 -s3 -m2 28,928,850 ~1448 6 -s0 -m1 23,487,047 ~68522 6 (100 MB) -s1 -m1 25,280,406 ~33277 6 -s2 -m1 29,045,902 ~1509 6 -s3 -m1 30,080,719 ~1408 6 -s0 -m0 24,210,216 ~66463 6 (64 MB) -s1 -m0 25,882,226 ~33121 6 -s2 -m0 30,591,255 ~1481 6 -s3 -m0 31,276,535 ~1377 6
bee 0.78 build 0154 is an open source (Delphi Object Pascal) command line archiver (with optional GUI) by Andrew Filinsky and Melchiorre Caruso, Sept. 23, 2005. It uses PPM. The -m3 option select maximum compression (default is -m1). The -d8 option selects 512 MB memory, the maximum that does not cause disk thrashing (default is -d2 = 10 MB).
bee includes beeopt, a parameter optimizer similar to epmopt.
This was not tested. bee comes preconfigured with parameters
trained on .txt and .xml files (and other types) in file bee.ini. This was tested by renaming
enwik7 (first 107 bytes)
to enwik7.txt and enwik7.xml but compression was worse.
The executable size is a zip archive containing
bee.exe and bee.ini. This is much smaller than the zipped source code download.
.1829 uhbc
uhbc 1.0 is
an experimental, closed source command line file compressor
by Uwe Herklotz, June 30, 2003. It uses BWT. The -b100m option
selects 100 MB block size, which requires 800 MB for compression
and 500 MB for decompression. -m3 selects maximum compression
for the entropy coding stage, which consists of run length coding
(RLE) + DWFC (double weighted frequency counting) + entropy coding.
WFC is described in
Deorowicz, S.,
Improvements to Burrows–Wheeler compression algorithm,
Software–Practice and Experience, 2000; 30(13):1465–1483.
Additional results on enwik8:
Options enwik8 size Comp Decomp (ns/byte) ----------------------------------------- ----------- ---- ------ -m3 -b100m (one 100 MB block) 20,930,838 1145 858 -m3 (default block size is 5 MB) 24,296,345 914 733 -m2 (RLE + WFC + entropy coding, default) 24,411,843 806 644 -m2 -cp (prefix sort, default is suffix) 24,589,110 813 578 -m1 (RLE + MTF (move to front) + entropy) 25,021,683 680 547 -m0 (RLE + direct entropy coding) 25,341,274 603 500
smac v1.8 (discussion) is a free, experimental file compressor for Windows by Jean-Marie Barone, Jan. 22, 2013. It uses an order-4 bitwise context model and arithmetic coding. It takes no options. Source code is in x86 assembler.
smac v1.9, Jan. 31, 2013, uses an order 4 and order 6 context model and chooses at each bit the model whose prediction is further away from 1/2.
smac v1.10, Feb. 7, 2013, uses a nonstationary model like PAQ6. When a bit count is incremented, half of the count over 2 of the other bit value is discarded.
smac v1.11, Feb. 18, 2013, switches between order 6, 4, and 3 context models depending on which prediction is furthest away from 1/2. For files smaller than 5 MB, it switches between lower order contexts.
smac v1.12a, Mar. 11, 2013, uses indirect context models. The context is mapped to a 16 bit state representing the number of 0 and 1 bits as 7 bit counters, plus the last 2 bits. When the counters reach the maximum value of 127, they are both halved and incremented. v1.12a is a speed improvement over v1.12 (released the day before) using prefetch instructions.
smac v1.13, Mar. 22, 2013, mixes the order 6, 4, and 3 indirect context models in the logistic domain, log(p(1)/p(0)). Each prediction has a fixed weight of 1/3.
smac v1.14, Apr. 20, 2013, uses adaptive mixer weight update with a learning rate of 0.002.
smac v1.15, May 19, 2013, uses an order 6-4-3-2-1 context mixing algorithm.
smac v1.16, July 30, 2013, has improvements to the context bit history model and match model.
smac 1.17 (discussion), Nov. 1, 2013, has some speed optimizations and small changes in the bit history counter rounding and use of floating point lookup tables.
smac 1.17a (discussion), Nov. 17, 2013, has some speed improvements with no change in compression.
smac 1.18 (discussion), Dec. 8, 2013, uses a polynomial function to compute squash() to improve speed.
smac 1.19 (discussion), Dec. 17, 2013, has a speed optimization of the squash function.
smac 1.20, Jan. 16, 2014, improves modeling of 0 frequency counts using a Laplace estimator, p=(n0+1)/(n0+n1+2).
Compression Compressed size Decompresser Total size Time (ns/byte) Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- smac 1.8 29,143,755 265,303,304 2,713 x 265,306,017 1917 1935 1691 o4 26 smac 1.9 26,888,498 242,014,586 2,832 x 242,017,418 3168 3266 1690 CM 26 smac 1.10 26,398,662 230,781,496 2,791 x 230,784,287 2917 3085 1649 CM 26 smac 1.11 25,633,348 223,294,431 2,831 x 223,297,262 3930 4331 1616 CM 26 smac 1.12a 24,948,001 216,016,106 2,833 x 216,018,939 4463 4568 1565 CM 26 smac 1.13 23,322,767 202,011,435 2,818 x 202,014,253 6801 6502 1613 CM 26 smac 1.14 22,675,896 193,797,222 2,965 x 193,800,187 5943 6148 1577 CM 26 smac 1.15 22,303,381 191,064,676 3,074 x 191,067,750 6518 7313 1658 CM 26 smac 1.16 21,831,822 183,551,384 3,465 x 183,554,849 6949 7285 1542 CM 26 smac 1.17 21,816,272 183,459,153 3,429 x 183,462,582 5672 5867 1542 CM 26 smac 1.17a 21,816,272 183,459,153 3,429 x 183,462,582 5335 5613 1542 CM 26 smac 1.18 21,816,285 183,459,860 4,522 x 183,464,382 4901 5137 1544 CM 26 smac 1.19 21,816,323 183,459,942 4,361 x 183,464,303 4211 4257 1542 CM 26 smac 1.20 21,781,544 183,190,888 4,356 x 183,195,244 4249 4399 1542 CM 26
TC 5.2 dev 2 is an experimental command line file compressor, currently under development by Ilia Muraviev. It takes no options.
Compressed size Decompresser Total size Time (ns/byte) Program enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ---- tc 5.0 dev 1 (May 26 2006) 33,774,535 295,836,604 23,681 x 295,860,285 236 204 LZP 3 tc 5.0 dev 2 (June 10 2006) 32,417,139 283,039,249 22,659 x 283,061,908 270 244 LZP 3 tc 5.0 dev 4 (June 21 2006) 32,417,139 283,039,249 22,496 x 283,061,745 224 206 LZP 3 tc 5.0 dev 6 (July 6 2006) 29,544,971 257,416,397 28,528 x 257,444,925 279 279 PPM 3 tc 5.0 dev 7 (July 9 2006) 28,111,955 250,077,573 30,058 x 250,107,631 285 325 20 PPM 3 tc 5.0 dev 9 (July 18 2006) 27,801,253 246,923,158 30,106 x 246,953,264 363 385 24 PPM 3 tc 5.0 dev 11 (July 24 2006) 27,293,396 242,199,762 31,074 x 242,230,836 446 393 56 PPM 3 tc 5.1 dev 1 (Oct. 1 2006) 31,708,176 280,007,538 26,578 x 280,034,116 289 154 25 LZ tc 5.1 dev 2 (Oct. 2 2006) 31,155,963 274,831,393 24,620 x 274,856,013 344 147 25 LZ tc 5.1 dev 5 (Oct. 13 2006) 28,567,681 247,853,181 26,659 x 247,879,840 951 439 148 CM tc 5.1 dev 7 (Dec. 18 2006) 27,934,960 241,898,216 40,104 x 241,938,320 1864 639 148 CM tc 5.1 dev 7x (Jan. 13 2007) 27,888,899 241,088,655 41,265 x 241,129,920 1974 638 609 CM tc 5.2 dev 2 (Feb. 7 2007) 21,481,399 184,939,711 41,112 x 184,980,823 3637 3655 230 CM
5.0 Dev 1 uses LZP. Dev 4 includes an improved hash table to conserve memory and a faster range coder compared to dev. 2, but compression is the same. Starting with 5.0 dev 6, LZP literals and match lengths are encoded using PPMC (PPM with fixed escape probabilities to lower orders). Dev 7 and 9 use order 3-1-0 PPMC.
tc 5.0 dev 11 (July 24, 2006) is the last of this series.
tc 5.1 dev 1 uses ROLZ (reduced offset LZ) with PPM order 1-0 for literals, offset set reduced with order 2 context, and a 16 MB dictionary.
tc 5.1 dev 2 has improved parsing and is archive compatible with dev 1.
tc 5.1 dev 5 uses ROLZ plus context mixing (instead of PPM) for order 2 literals.
tc 5.1 dev 7 uses improved parsing (flexible parsing) and adds SSE.
tc 5.1 dev 7x uses a larger dictionary.
tc 5.2 dev 2 uses FPW
(fast PAQ weighting).
bwtsdc v1
(discussion)
is a free, experimental file compressor with source code
by David A. Scott and Yuta Mori. It takes no options. Memory usage is
5 times the file size.
The program
is bijective, meaning that any file is valid input to the decompresser,
and no two inputs will decompress to the same file. In other words, there is
an exact 1 to 1 mapping between uncompressed files and compressed files.
The compressor uses multiple
stages, each of which is bijective. The first stage is a BWT variant called
BWTS
(BWT Scottified) developed by Scott. In this variation, it is not necessary
to store the starting point for the inverse BWT. This is achieved by dividing
the input into a lexicographically nonincreasing sequence of Lyndon words.
A Lyndon word is any subsequence that lexicographically precedes
any of its rotations. The block is then sorted using contexts that wrap
within Lyndon words rather than the whole block.
The BWTS is followed by distance coding (DC, developed in part by Mori),
and Fibonacci coding, where each stage is also bijective.
The compressor is implemented as 3 programs called from a .bat file.
fbc v1.0 is
a free, experimental file compressor for Windows by David Catt, Feb. 29, 2012.
It is described as using BWT (divsufsort) with a fast adapting (rate 1/16)
14 bit context model consisting of an 11 bit history and 3 bits
to encode the position in the current byte. The input is preprocessed using
Eugene Shelwein's alphabet reordering preprocessor, BWT_reorder_v2.
The argument 250000000
selects the block size in bytes. Memory usage if 5 x block size.
fbc v1.1, Mar. 2, 2012,
fixes a memory allocation bug that caused decompression to fail
for a block size of 333 MB. It automatically selects between 32 and 64 bit
versions of divsufsort. Results are shown for the 64 bit version.
ppmvc v1.1
is a free, command line file compressor by Przemysław Skibiński, May 12, 2006,
based on PPMd var. J by Dmitry Shkarin. It uses variable length contexts
as described in the paper,
P. Skibinski and Sz. Grabowski.
Variable-length contexts for PPM.
Proceedings of the IEEE Data Compression Conference (DCC04), pp. 409-418, 2004.
Long matching strings are encoded as in high order ROLZ,
encoded as an index to a matching context and a length.
The command line options are the same as in PPMd: -o8 selects order 8, -m256 selects
256 MB memory, -r1 partially rebuilds the model when memory is exhausted. I tuned
the compressor to -o8 on enwik8. There are additional options related to VC
compression (which must be specified during decompression), but I used the
defaults since there is no guidance on how to set them in the program documentation.
The paper suggests that the best values (and defaults) are to encode matches
of context length order+1 with a minimum match length of 2*order, searching the
last 8 to 16 contexts for the longest match.
The effect is usually greatest for low order PPM.
chile 0.3d-1 is a free,
command line file compressor as C source code by Alexandru Mosoi, May 29, 2006.
It uses BWT. The option -b40000 selects a block size of 40000 KB, which requires about
785 MB of memory for compression and 240 MB for decompression. Version 0.3d1
is identical to version 0.3d except that the maximum block size
was increased from 2048 KB to 99999 KB. For this test the program was compiled for Windows
using MinGW 3.4.2 as specified in the Makefile.
chile 0.4
(Jan. 27, 2007)
introduces a faster algorithm for building suffix arrays that uses less memory (7N).
The option -b=244141 selects the block size in Kb (to split enwik9 in 4 equal parts).
It was compiled using MinGW gcc 3.4.5 with options -W -Wall -fomit-frame-pointer -g -O3
and tested in WinXP Home with 2 GB memory.
The program is supplied as source code only. It was compiled with g++ 4.6.3 using
the supplied Makefile in Ubuntu on a Core i7 M620, 4 GB.
There are two programs, the compressor "bwte" and decompresser "unbwti".
The compressor computes a low memory BWT using at most the memory specified
by the -m option (in MB). The -b option specifies how the BWT transformed input
is to be compressed. -b 1 specifies zlib, -b 4 specifies lzma, and -b 2 specifies
run length coding and range coding. There is no block size parameter. The input
is compressed in a single block. Decompression requires 4 times the file size
in memory, which used all of the test machine for enwik9 so was tested for enwik8 only.
Compression of enwik9 with -b 4 failed (cannot create pipe).
CTXf 0.75 pre-beta 1
is a free, closed source command line archiver by Nikita Lesnikov, Sept. 20, 2003.
It uses PPM with preprocessing for text, exe and multimedia files.
The option -me selects extreme (best) compression. It uses about 78 MB memory
in Windows task manager.
m03exp-2005-01-27 is an
experimental, closed source GUI file compressor by mij4x, Jan. 27, 2005.
It uses BWT implementing the M03 algorithm by Michael A Maniscalco.
with a maximum block size of 8MB. (Note on the GUI: to compress
or decompress, drop a file on the program window. Right click to
select options).
m03exp-2005-02-15
(Feb. 15, 2005) supports blocks up to 32MB but is otherwise identical.
Stuffit 9.0 is a commercial GUI archiver by Allume Systems,
now Smith Micro. This was the current version as of May, 2006.
Note: their free 30 day trial required registration and a credit card number
which was charged if you forgot to cancel. The options tested were:
Stuffit
12.0.0.17 (compression technology version 12.0.0.21) was released Jan. 31, 2008.
It includes lossless compression of JPEG and MP3 files and lossy recompression
of zip archives, GIF, TIFF, PNG, and PDF files. It supports a native SITX format
as well as zip, gzip, rar, bzip2, compress, tar, cab, and some more obscure
formats. It is multithreaded for multicore support, although I tested it on
a single core processor. I only tested
the native general-purpose formats. For these tests, I used the command
line programs console_stuff.exe and console_unstuff.exe to reduce the executable size
and measure run time more accurately. The options are
-m=1 (LZ77-Huffman), -m=2 (LZ77-arithmetic), -m=4 (PPM), -m=8 (BWT), -l (level 2-16,
higher is slower but better), -x (memory extents, max 30, higher uses more memory).
The best compression for text is -m=4 (PPM) with maximum
memory -x=30. (In the GUI but not the command line, above 29 causes an out of memory
error with 2 GB RAM). The -l option apparently has no effect on PPM.
The decompresser size is based on console_unstuff.exe and the minumum set of
5 .dll files needed to run it (4 common plus Plugins/sitx.dll).
The full GUI installer (without Office plugins)
zips to 17,051,856 bytes. The tested version was a complimentary copy provided
by the company.
Stuffit 2009 13.0.0.19 (compression technology 13.0.0.24) was released Dec. 19, 2008.
I tested as with Stuffit 12, however the technique of finding the minimal set
of .dll files that I used in Stuffit 12 did not work (internal error)
so I had to include the zipped distribution
size (StuffIt2009.exe), which includes many other compression formats and a GUI.
The tested version was a complimentary copy provided by the company.
plzma_v3b
(
discussion) is a free, closed source, experimental file compressor for Windows
(32 and 64 bit versions) by
Eugene Shelwien, Oct. 8, 2011. It uses LZMA (7zip equivalent) with a modified entropy encoder.
plzma_v3c
was released Mar. 19, 2012. Options are
as follows:
crook v0.1
(discussion)
is a free, open source file compressor by Jüri Valdmann, Mar. 5, 2012.
It uses bit-level PPM.
Because it predicts bits rather than bytes, there is no
escape modeling. This is like DMC in that each bit-level context
is mapped to a next-bit prediction and a
count (equvalent to two counts of zeros and ones). But
unlike DMC, it avoids the problem of duplicate states representing
the same contexts, which would dilute the statistics and waste memory.
Bits are modeled MSB first.
Contexts are stored in a binary tree where the two child nodes
represent the current context extended by one bit on the right. Each node also has a
pointer to a suffix node, representing the current context shortened by
one byte on the left. Contexts always begin on byte boundaries.
Each context maps to a 22 bit prediction for the next bit
(initialized to 0.5) and a count. When a bit is coded, the
current node and all of its suffix nodes are updated by adjusting the
prediction to reduce the error by 1/count and the count is incremented
by 1 up to a limit of 32. The initial tree is bytewise order 0 (255 contexts)
with initial counts of 12. Subsequent nodes are added with a count of 1.5
and a prediction inherited from its suffix node
whenever there is no node to represent the 1 bit extension, and the new node becomes
the current context.
The option -m1600 limits memory usage to 1600 MiB. When memory is exhausted,
no new nodes are added to the tree, but predictions and counts of existing
nodes continue to be updated. The current context then becomes the suffix
node if needed. The option -O8 limits the tree depth to bytewise
order 8 (found to be optimal for both enwik8 and enwik9). When the current
node reaches this depth, no child nodes are added, but existing nodes and
their suffixes continue to be updated, just as if the memory limit were reached.
Increasing the model order improves
compression but also causes the tree to grow faster, which sometimes makes compression
worse if the memory limit is reached sooner. The defaults are -m128 -O4.
Compression and decompression require the same time and memory. Also,
the same compression options must be given again during decompression.
(I added 10 bytes to the decompresser size to account for this). The compressed file
is arithmetic coded with the original file size saved in the first 4 bytes.
File sizes are limited to less than 2 GiB. The program is distributed as
source code only. To test, I compiled with g++ 4.6.1 in 32 bit Windows
using the options recommended in the source comments.
ppmx 0.01
is a free, experimental, closed source file compressor by Ilia Muraviev,
released Nov. 25, 2008. It uses PPM with no filters. It takes no options.
ppmx 0.02
was released Dec. 2, 2008. It uses order 9 PPM with hashed context tables,
as discussed here.
There is also a
core 2 duo version
which is faster, although it runs on only one core, and has a slightly larger
executable. Note that the table below is misleading because on enwik8 the regular
version compressed at 976 ns/byte (12% longer) and decompressed at 992 ns/byte
(4.5% longer) than the core 2 duo version.
ppmx 0.03
(discussed here) was
released Dec. 22, 2008.
ppmx 0.04
(discussed here)
was released Jan. 5, 2008. It uses order 12-5-3-2-1-0 PPM and 280 MB.
ppmx 0.05
(discussion),
Jan 19, 2010, adds SEE (secondary escape estimation), more memory, and some optimizations.
ppmx 0.06,
released July 27, 2010, is designed for improved speed and less memory usage rather
than compression ratio. It removes SEE and uses only a fixed order 4-2-1-0 model
with hash tables. It has a P4 version for Pentium-4 and higher that is about 12% faster.
This is the version tested. It has a larger executable (54,496 vs. 45,216).
ppmx 0.07, Feb. 20, 2011,
uses order 5-3-2-1-0-(-1) PPM with hash tables. Memory usage is increased to 302 MB.
ppmx v0.08
(discussion),
Jan. 1, 2012, uses order 6-4-2-1-0-(-1) PPM with hash tables and SEE improvements.
ppmx
0.09
(discussion)
was released Mar. 24, 2014.
It uses LZ77 with arithmetic coding. The option -49 selects method 4 (1, 2, 4)
and level 9 (1..9) for best compression. Other combinations were not tested.
There is also a Linux version which was not tested.
Memory usage fluxuates but peaks at 654 MB for compression and 90 MB for decompression.
The Windows version produces read-only output files that must be set with
"attrib -r" before they can be modified or deleted.
lzturbo 0.1 (Oct. 5, 2007)
is threaded for parallel execution on multicore machines. The maximum
comprssion level is -59 where it uses 248 MB for compression and a peak
of 72 MB for decompression. Other modes compress much faster. The read-only
bug was fixed.
lzturbo 0.9
was released Feb. 25, 2008. Decompression memory peaks at 79 MB.
lzturbo 0.94
was released Apr. 11, 2009. The option -b59 selects method 5, compression level 9
for maximum compression. -b100 selects a block size of 100 MB for independent
compression in separate threads. The default is 32 MB. -p0 forces the
compressor to run on one core. By default the program runs on on all cores, but
this causes the program to run out of memory with -59 because each thread uses 1450 MB.
Decompression ran on 2 cores with a process time of 20 seconds per core
and wall time of 28 seconds using about 300 MB memory. Faster modes tested
below are run on 2 cores with average process time per core shown.
lzturbo 1.1, Apr. 29, 2013,
runs only on 64 bit Windows and 64 bit Linux.
The Linux version was tested under Ubuntu (note 48) using the non-static
(smaller) executable. The 2 digit options -11...-49 select the compression
method and level. The first digit can be 1..4 with higher numbers
compressing better. The second digit can be 0, 1, 2, or 9 with higher
numbers compressing slower without affecting decompression speed.
The program gave an error during compression with -40, -41, -42.
Option -b1000 selects a block size of 1000 MB. The default is -b24.
Separate blocks can be compressed and decompressed in parallel. The
test machine automatically selects 4 threads. Larger blocks improve
compression but use more memory and allow fewer threads to be allocated.
-b1000 causes it to use 1 thread since there is a single block.
At level 9 (-19, -29, -39, -49), it is not possible to compress enwik9
with -b1000 on the 4 GB test machine because it will use over 6 GB memory
and start disk thrashing. -p1 selects 1 thread. -p0 disables multi-threading.
lzturbo 1.2
was released Aug. 7, 2014 with updates on Aug. 10 and 11, 2014,
with compression ratio and decompression speed improvements.
Methods -30, -31, -32, -39 use ANS (Asymmetric numeric system) encoding instead
of arithmetic coding with SSE/AVX code selected at run tim. The updates fixed an
"illegal instruction" error during compression in these modes on the test machine
and some other processors. The other modes were tested on the Aug. 7 release.
Options are like v1.1.
comprolz 0.1.0
(discussion)
is a free, open source, experimental file compressor by
Zhang Li, Oct. 7, 2012. It uses ROLZ. The option -b256 selects the maximum
block size. During compression it uses 60-65% of two cores. Decompression
uses one core.
Only source code was provided. It was compiled for 32 bit Windows Vista
using MinGW 4.6.1 using "gcc -O3 *.c".
comprolz 0.2.0 was released Oct. 16, 2012. It includes the -f
option to select flexible parsing. It is slower but compresses better.
comprolz 0.10.0
(discussion) was released Nov. 25, 2012. It includes a dictionary
derived from the first 10 MB of enwik8. To test, it was compiled
as suggested in the documents using gcc 4.7.0 with options
"-O3 -fomit-frame-pointer -mno-ms-bitfields". Source code is shared
with comprox 0.10.0. The executable, packed with UPX, is smaller.
comprolz 0.11.0 was released Dec. 17, 2012.
The program builds a dictionary from the input instead of using
a static dictionary. 32 bit executables
are included for Windows and Linux. The Windows version was tested.
comprolz 0.11.0-bugfix1, Dec. 18, 2012,
fixes a bug that caused poor compression.
sbc 0.970r2
is a free, closed source command line archiver and file encryptor
by Sami, June 27 2005. Compression options suggest it uses BWT.
The -m3 option selects maximum compression, requiring 32 MB memory
(-m1 is minimum). The -b63 option
selects maximum block size (32 MB, requiring 192 MB additional memory).
-ad disables adaptive block size reduction
for homogeneous data. SBC runs faster with smaller block sizes and minimum
compression as shown:
xz 5.0.1 is a free, open source file compressor,
Jan. 29, 2011. xz specifies a container format written by Lasse Collin. It uses
the public domain LZMA2 compressed format from 7zip by Igor Pavlov. There are
versions for most operating systems including Windows and Linux. The Windows
version was tested. The option -9 specifies maximum compression and
memory. The default is -6. The option -e (extreme) specifies better compression
at a cost in compression (but not decompression) time.
Program size is based on xz.exe. There is a separate decompressor (xzdec.exe)
which is smaller and decompresses to standard output, but the Windows version
does not work because it outputs in text mode. Additional results are shown below
for enwik8 for compression and decompression time (ns/byte) and compression
and decompression memory (in MB).
xz 5.2.1
was released Feb. 26, 2015.
The model order was tuned on enwik8. Additional results are shown
for order 10,
for -m5 (maximum compression), and for normal compression as a .exe and
.rar file. The decompresser in the last case is zipped unrar.exe.
WinRAR 4.20 was released June 9, 2012. It costs $29 with a 40 day free trial
as of Feb. 1, 2013. Options are the same. -m1 through -m5
select compression level. The default is -m3. The algorithm is LZ77 with a 4 MB
window. -mc7:128t+ selects PPM, order 7,
with maximum 128 MB memory. Time and memory to decompress with PPM is about the
same as compression.
WinRAR 5.00b2 was released Apr. 29, 2013. It includes a larger dictionary,
up to 1 GB for the 64 bit version and 256 MB for the 32 bit version.
Option -ma5 selects the new archive format, which is not compatible
with v4.20 or earlier. The default is the older format. In the newer
format, option -mc is silently ignored. Option -m3 is the default
compression level.
plzip is a free, open source
file compressor by Antonio Diaz Diaz, Feb. 16, 2010. It is "parallel lzip", compatible
with lzip, but multi-threaded for
parallel execution. It uses LZMA (LZ77 with arithmetic coding). The -9 option
selects maximum compression. It has a command line interface similar to gzip.
When it compresses, it removes the original file and adds a .lz extension.
lzip and plzip are written for Linux. A
Windows port by
Christian Schnaader on May 2, 2010 was tested. On my test computer (2 core T3200, 2 GHz),
compression showed 180% CPU and decompression showed 117%.
lzip 1.14-rc3 was released Jan. 15, 2013.
plzip 1.5
was released June 2, 2016. I tested the 64 bit Windows compile in Linux.
comprox_sa 20110927
(discussion) is a free,
experimental, open source file compressor by Zhang Li, Sept. 27, 2011. It uses LZSS
(in 4 MB blocks) followed by arithmetic coding. The program takes no arguments. It uses
60 MB memory for compression and 6 MB for decompression. It runs in both Windows and
Linux. Only the Windows version was tested.
Version 20110928 was released Sept. 28, 2011. Compression runs in 2 threads.
Both the Windows and Linux versions were tested (on different computers).
Version 20110929 was released Sept. 29, 2011. Decompression also runs in
2 threads. Compression is slightly improved.
comprox version 0.1.1, Oct. 10, 2011, replaces comprox_sa. It is a rewrite
using LZ77 (instead of LZSS) and arithmetic coding. It takes a compression
level 0 (fastest) to 9 (best) with a default of 5. All levels use the same
memory, 218 MB for compression and 44 MB for decompression. The Linux
version reports the same resident memory as Windows but higher virtual memory:
236 MB to compress and 284 MB to decompress. Both compression and decompression
run in 2 threads. Reported times are real times.
comprox 0.6.0 was released Aug. 24, 2012. It uses static 4K dictionary
encoding followed by LZ77 and arithmetic coding. It was released as open
source (3 clause BSD) C code only. For testing, it was compiled using g++ 4.6.1
as "gcc -O3 *.c" under 32 bit Windows. The option e200 means to use a 200 MiB
block size. The default is e16. Larger blocks improve compression but use
more memory. The program crashed with e250 or larger.
comprox 0.7.0
(discussion)
was released Sept. 10, 2012. It includes multi-threaded
compression and other improvements. It includes a static English
dictionary with about 3000 common words.
It was tested in 64 bit Linux compiled
with "gcc -O3 *.c -lpthread" and in 32 bit Windows compiled with
"gcc -O3 *.c -lpthread -Wl,--stack,8000000".
comprox v0.8.0 was
released Sept. 26, 2012 with better compression. The Linux version
was compiled with "gcc -O3 -march=native *.c -lpthread". The Windows
version was compiled as before.
comprox 0.8.0-bugfix1, Sept. 27, 2012, fixed a bug that caused compression
to crash on some input files. It was compiled with MinGW 4.6.1 with
"gcc -O3 -msse2 -s -Wl,--stack,8000000 *.c -lpthread".
comprox 0.9.0 was released Oct. 16, 2012. The -b option sets the
block size in MB. Default is -b16. -m sets number of matches to
check. Default is -m40. -f selects flexible parsing. To test, the
program was compiled "gcc -O3 -march=native -s *.c" as above.
comprox 0.10.0
(discussion) was released Nov. 25, 2012. It includes a dictionary
derived from the first 10 MB of enwik8. To test, it was compiled
as suggested in the documents using gcc 4.7.0 with options
"-O3 -fomit-frame-pointer -mno-ms-bitfields". Source code is shared
with comprolz 0.10.0. The executable, packed with UPX, is smaller.
comprox 0.11.0
was released Dec. 17, 2012. It builds a dictionary from the input
rather than use a static dictionary. Executables are included for
32 bit Windows and Linux. These compressed smaller than the source code.
The compressor crashed with -b250 (250 MB block size)
on enwik9, but -b200 worked. -m100 selects the match search limit
(default -m40). -f selects flexible parsing. Using large -m makes
compression time nonlinear, i.e. increasing from 75s to 2115s from
enwik8 to enwik9.
comprox 0.11.0-bugfix1, Dec. 18, 2012,
fixes a bug that caused poor compression.
lzham alpha 2 is a free,
open source (MIT license) file compressor and library by Richard Geldreich Jr.,
Aug. 21, 2010. LZHAM is short for LZMA-Huffman-Arithmetic-Markov. It is based
on LZMA (7zip) but instead of using arithmetic coding throughout, it uses them
only for binary decisions and uses Huffman or
Polar
codes for literal and match codes. A Polar code is similar to a Huffman code
but is simpler to calculate at a cost of 0.1% in compression. Polar codes
are calculated as follows:
For this test, lzhamtest_x86 was used. There is a _x64 version for 64 bit machines
which is faster. The library supports different speeds and dictionary sizes, but
the test program does not have any options to select them, so none were used.
Decompression uses 67 MB memory vs. 609 MB for compression.
Compression uses both cores on the test machine
but decompression uses only one.
Version alpha 3, Aug. 30, 2010, supports all of the options suppored by
the library. Option -d26 selects 64M dictionay, the largest supported by
the x86 version. (The x64 version supports up to -d29 = 512M). -m4 selects "uber"
compression mode. There are 5 compression levels from -m0 through -m4.
The highest two levels use Huffman codes rather than Polar codes. -t2 says
to use 2 helper threads (to match the number of cores on the test machine). The
default is to use 1 less than the number of cores, up to 16 threads.
Decompression is not multi-threaded.
The x64 version was tested
by the author. I guessed at memory usage. Each increment of the -d option approximately
doubles memory usage.
lzhamtest v1.0
(discussion)
is the test code for the source code release on Jan. 25, 2015.
To test on note 48, it was compiled using "cmake . ; make" in Ubuntu.
Option -d29 selects a 512 MB dictionary. -d26 selects 64 MB.
Default is -d28 (256 MB). Option -x selects extreme parsing.
flashzip 0.2 was released Jan. 11, 2008. It is compatible with version 0.1 but faster.
Note: in both versions, CPU utilization during compression is about 28% to 35%. Times
shown are process times.
flashzip 0.3 was released Feb. 4, 2008. It uses ROLZ plus arithmetic coding. It
takes an option x for better compression (slower) and 1 through 5,
where 5 is the slowest (best compression).
flashzip 0.9 was released June 28, 2008. Option -m2 selects method 2 (default
is -m1). -b1 through -b5 select buffer size, which affects memory usage.
Default is -b3. -s1 through -s7 selects match length and speed. Default is
-s1 (fastest, worst compression).
flashzip 0.91 was released Aug. 17, 2008. Options are like version 0.9.
Memory usage was increased
to 198 MB for compression and 138 MB for decompression using settings for
best compression. Minimum requirement is 10 MB and 6 MB.
flashzip 0.93a
was released Mar. 9, 2009.
flashzip 0.94 was released Mar. 25, 2009.
flashzip 0.99 was released July 23, 2009.
flashzip 0.99b4 (Aug. 25, 2009) is an archiver rather than a compressor. The -s
option was renamed to -c and the -b option was increased to -b8 to allow more
memory usage. For enwik8, memory usage for both -m1 and -m2 is 182 MB for
compression and 162 MB for decompression.
For enwik9, memory usage for -m2 is 609 MB for compression and 592 MB for decompression.
flashzip 0.99b8 (Feb. 28, 2010) has 4 compression levels from -m0 (fastest) to
-m3 (best). The buffer size option was increased to -b9 (1 GB).
Memory usage depends on the input size.
For -m0 -c7 -b7 enwik8, compression takes 214 MB and decompression takes 195 MB.
For -m1 through -m3 -c7 -b8, enwik8 compression takes 231 MB and decompression takes 195 MB.
For -m3 -c7 -b8, enwik9 compression takes 658 MB and decompression takes 625 MB.
Changing -b8 to -b9 has no effect on size, speed, or memory usage for enwik8,
but for enwik9 it improves compression and increases memory usage to 1111 MB for
compression and 1078 MB for decompression. The -s1 option enables the -b9 option.
Otherwise -b9 will cause a "no memory" error.
flashzip 0.99c1 (June 1, 2011) improves compression and speed. The option ranges
are -m0...-m3, -c1...-c7 and -b1...-b7. Only the maximum compression options were tested.
flashzip 0.99c3 (Oct. 10, 2011) is multi-threaded for compression
in modes -m1, -m2, -m3. Decompression runs in a single thread.
The archive is compatible with the previous version.
In the tested mode (maximum compression), memory usage depends on the file
size and climbs steadily during compression or decompression. It is the
same for either, and same as the previous single threaded version.
flashzip 0.99d1 was released Oct. 31, 2011. It has only two
options, -m0...-m9 (default -m4) for compression method (fastest...best)
and -b1...-b7 (default -b1) for buffer size. Memory usage ranges from
30 MB at -b1 to 1100 MB at -b7.
flashzip 1.0.0 was released Oct. 3, 2012. Options -m1 to -m7 select compression
-mx7 compresses best. Higher levels compress
slower and use more memory but have little effect on decompression
speed, which is generally faster. Decompression uses the same memory
as compression, up to 1.1 GB depending on the file size.
Options -b1 to -b7 select buffer size. Larger values
use more memory but don't affect speed. The default is -b4.
The program can use up to 8
threads and auto-detects the number of available cores. In the
high compression modes tested, only 1 of 2 available cores was used.
-e creates a self extracting archive. It extracts to the saved name
using both cores.
flashzip 1.1.2 was released Dec. 12, 2012. It includes a GUI that
calls the command line version. The command line version was tested.
The compression options were changed to -m0..-m3 and -mx0..-mx3, with
-mx3 selecting maximum compression. Option -k0..-k7 select ROLZ dictionary
size with -k7 using 256 MB for best compression using the most memory.
-b1024 selects a buffer size of 1024 MB for best compression but using
the most memory. There is a -t option for multi-threaading which defaults
to -t1 to select a single thread. Using more threads makes compression worse.
The -e option creates a self extracting archive by appending the compressed
file to a copy of flashzip.exe, and therefore does not compress any smaller
when the decompresser is included.
csc2 is a free,
experimental, closed source file compressor by Fu Siyuan, Apr. 18, 2009.
It uses LZP with order 1 modeling of literals and range coding over a 270
size alphabet. The program takes no options. It recognizes whether the input
file is compressed, and if so, decompresses it.
csc3 v.2009.08.12
is a free file
compressor with source code in C by Fu Siyuan, Aug. 11, 2009. It uses LZ77. The
option -m3 selects best and slowest compression
(range -m1 to -m3, default -m2). -d7 selects the maximum dictionary size
(range -d1 to -d7, default -d4). -fo turns off EXE and delta filtering
(default unless detected by file name extension).
The decompresser size is based on csc3.exe, which is smaller than csc3compile2.exe,
but does not work on some machines. It is smaller than the zipped source code (17,247 bytes).
Timing is similar for both versions and a version compiled with gcc 4.4 with -O2
-s -march=pentium4 -fomit-frame-pointer.
csc31
was released Sept. 23, 2009 without source code.
Discussion.
csc32 a2
(discussion),
May 9, 2010,
is a rewrite of csc31. The option -m3 selects maximum compression. -d9 selects
maximum dictionary size. Memory usage is 528 MB for compression and 330 MB
for decompression.
csc32 final,
Mar. 1, 2011, has 3 compression settings from -m1 (fastest) to -m3 (best) and dictionary sizes
up to -d512 (512 MB) which get the best compression but use the most memory. Compression requires
memory in addition to the dictionary, but decompression does not. Source code is now available.
csarc 3.3
(discussion) is a free, open
source (public domain) archiver with a LZMA like algorithm with dedupe and
dictionary preprocessing of text. It was released Mar. 21, 2015.
Option are compression level -m1 to -m5, dictionary size up to -d1024m (1 GB),
-t1 to -t8 (number of threads, default 1) and -p1 to -p4 to split large files into
1 to 4 parts to compress in parallel (default 1). To test, I compiled from source with g++ 4.8.2.
packet 0.01 is a free,
experimental file compressor by Nania Francesco Antonio, May 11, 2008.
It uses LZP. It takes no options.
packet 0.02, May 16, 2008,
improves compression for .wav files and supports files over 2 GB.
packet 0.03b, May 20, 2008,
uses LZ77, 3 MB for compression, and 1 MB for decompression. It takes an optional
argument 'x' meaning better but slower compression, and a level 1 through 6, where
6 is slowest with best compression.
packet 0.90b, June 18, 2008,
has options -m1 to -m4 (method) and -s0 to -s9 (intensity). All options use
10 MB for compression and 2 MB for decompression.
packet 0.91b, Aug. 6, 2009 has methods -m1 through -m6, where
-m6 is maximum compression. Decompression requires 1.5 MB.
packet 1.0
(discussion) was released Aug. 4, 2013. Options -m0..-mx9 select compression
level (default -m4). Option -t2 selects 2 threads (default -t1).
packet 1.1
(discussion) was released Dec. 7, 2013 for 64 bit Windows. It was tested
in Ubuntu under wine. Option -m9 (or -mx) selects maximum compression.
Default if -m4. -b512 selects maximum buffer size of 512 MB. Default is
-b64. -h4 selects maximum number of buffers. Default is -h2.
packet 1.2 was released July 19, 2015.
packet 1.9
(discussion) was released Aug. 19, 2016.
Option -mx selects maximum compression time. -h8 selects 2 GB hash table memory
for compression (max is -h7 = 1 GB in 32 bit .exe and -h9 = 4 GB in 64 bit .exe).
-b5 selects maximum buffer size 512 MB for both compression and decompression.
-r (recursive) and -s (solid) have no effect for single file compression.
The 64 bit version was tested under Ubuntu/Wine.
TarsaLZP Aug 8 2007
is a free, experimental file compressor with public domain source code (FASM)
by Piotr Tarsa.
Older versions used order 3 LZP to code the last 16 matches at order 3,
followed by order 2 PPM encoding of literals.
It takes no command line options but compression/decompression settings may be specified in
an initialization file. For this test, default settings were used and others were not tried.
The Jul 30 2007 version uses 2 LZP models, one with a 4 byte context and one 8 byte.
The program selects the one that gives a higher probability of a match. There is no
initialization file.
The Aug 8 2007 version uses 341 MB memory for compression and 333 MB for decompression.
The interim Aug 10 2007
version runs at high priority. (CAUTION, this will make your computer unusable while running).
TarsaLZP 29 Jan 2012 is distributed as Java source and class files. It has a GUI interface.
TarsaLZP 18 Nov 2012 takes several options, but defaults were used for testing.
It is available as source code in Python, Java, Javascript, and C. The C version
was tested by compiling with MinGW gcc 4.7.0 with options "-O3 -std=c99" in 32 bit Vista.
rzm 0.07h was released
Apr. 24, 2008. Advertised memory usage is unchanged.
pim 2.01 is a free GUI archiver by
Ilia Muraviev, based on PPMd by Dmitry Shkarin, using PPM. Version 2.01
was released June 14, 2007. It has options to model color images and
.exe files. These make no difference on text and were turned off.
It was timed with a watch.
pim 2.04 beta was released July 21, 2007. It has PPMd as its only option.
pim 2.10 was released July 31, 2007. Older versions are no longer supported.
pim 2.50 was released July 22, 2008. It supports 3 compression modes: store,
normal, and best. Only best was tested. It compresses in PPMd, bzip2 and DCL
formats and extracts BALZ, QUAD, ZIP, JAR, PK3, PK4 and QUAKE PAK archives.
CTW 0.1 is a free, command line file compressor with source
code by Erik Franken and Marcel Peeters, Nov. 13, 2002. It uses CTW (context tree weighting),
a type of context-mixing algorithm (with single bit prediction and arithmetic coding) combining
the predictions of different order contexts. Statistics are stored in a suffix tree.
The -d6 option selects order 6 (depth of context tree). -n16M selects the maximum of 16M nodes
for the tree (using 128 MB memory). -f16M selects the maximum 16 MB file buffer
(for rebuilding pruned contexts). The default values of all other options were tested on
enwik6 and found optimal. For -d, there is a tradeoff between compression and memory usage
as with PPM compressors. -d6 was found optimal on both enwik7 and enwik8.
boa 0.58b
is a free, closed source command line archiver by Ian Sutton, Apr. 2, 1998.
It uses PPM. The -m15 option selects maximum memory, 15 MB.
yzx 0.01
(discussion) is a free,
experimental command line archiver by Nania Francesco Antonio, May 3, 2010. It uses "LZKS"
decribed as an LZ type algorithm. Option -b5 selects maximum memory. Option -m2 selects
method 2 (default is -m1). -c8 selects number of match keys (range -c1 to -c8, default -c3).
Memory usage is 732 MB for compression and 137 MB for decompression.
yzx 0.02,
May 7, 2010, corrects a bug in compression.
yzx 0.03
was released May 21, 2010. The range of options is -m1..m2, -c1..c5,
-b1..b6. Memory usage with -m2 -c5 -b6 is 404 MB for compression and
268 MB for decompression.
yzx 0.04 was released
May 27, 2010. Decompression memory remains at 268 MB.
yzx 0.11 was released
Jan. 4, 2012. Options -m0..-m9 select compression method (fast..slow).
Options -b1..-b8 select ring buffer size (small..large). Options
-h1..-h6 select search buffer size (small..large). Default is -m2 -b2 -h4.
There was not enough memory to test maximum compression (-m9 -b8 -h6) without
reducing either -b or -h.
zstd 0.4.0 was released Nov. 29, 2015. It features a high compression mode.
-f means overwrite output. 20 is the compression level (only -1 to -9
are documented).
zstd 0.4.2 was released Dec. 2, 2015.
zstd 0.4.2_no_legacy (NL) was released Dec. 6, 2015. It is the same
program with reduced source code size by dropping legacy support.
zstd 0.5.1
was released Feb. 17, 2016. The decompressor size is the
source for zstd_little-0.5.1.tar.gz converted to a zip -9 archive.
zstd 0.6.0 was released Apr. 12, 2016. It adds level -22 option and adds
--ultra to allow more memory usage.
tornado 0.1 is a free, open source file
compressor by Bulat Ziganshin, Apr. 16, 2007. It uses LZ77 with arithmetic coding.
The -9 option selects a predefined compression profile for maximum compression.
There are custom options for hash table size, hash chain length, block size, type
of coder, and an option to force or prohibit cache matching. Some of these options
might give better compression, but were not tested.
tornado 0.3 has options -1
through -12. Each increment approximately doubles compression time and memory usage.
Decompression time is fast in all cases, but memory usage is approximately 2/3 that
of compression (for the LZ77 buffer). -12 caused disk thrashing and was not tested
for enwik9. There are several other options that were not tested.
tornado 0.4a
was released June 1, 2008. It includes Windows and Linux versions. There is a small
version (tor-small.exe) which does not include some of the advanced options.
The advanced options were not tested. Option -12 caused disk thrashing (2 GB memory)
when enwik9 reached 80% compression, so -11 was used instead.
tornado 0.6,
Mar. 8, 2014, adds optimal parsing. It has 16 compression levels.
The default is -5. For
testing (note 48) it was compiled from source in Linux with g++ 4.8.1 using
the provided build.sh script. Windows and Linux 32 and 64 bit executables are also
provided.
LZPXj 1.2h, Mar. 6, 2007, uses LZP + PPM with a preprocessor for x86 executables.
It has just one option (1-9) which select memory usage.
The default is 6. The maximum is 9. Each increment doubles usage.
scmppm 0.93.3 is
a GPL open source command line compressor for XML files by James Cheney and
Joaquín Adiego, Oct. 3, 2005, and using PPMd var. I
code by Dmitry Shkarin. It works by grouping XML data by tag, then compressing
with ppmd (similar to XMill). scmppm is distributed as UNIX source code only. For this test
it was compiled and run under WinXP using the latest version of Cygwin, g++, flex, and make as
of May 24, 2006. To compile I had to add the line extern "C" int fileno(FILE*);
to lex.yy.c.
The -l 9 option selects maximum compression.
crushm
is a free file compressor for Windows by Abhilash, July 12, 2013. It uses CM.
It takes no options.
fpaq0s2 is a
free, open source (GPL) file compressor by Nania Francesco Antonio, Sept, 29, 2006.
It is an order 2 model based on the order 0 compressor fpaq0s by David A. Scott,
which is based on fpaq0 by Matt Mahoney by modifying the arithmetic coder.
fpaq0x is the same order 2 model based directly on fpaq0.
fpaq0x1a is an order 3 model (hashed context) using fpaq0's arithmetic coder.
fpaq0s2b is a similar model based on fpaq0s. Both were released Oct. 1, 2006.
fpaq0x1b (Oct. 6, 2006) switches between different models up to order 3.
fpaq0s3 (Oct. 8, 2006) uses a simple order 0 model on groups of 3 bytes.
fpaq0s4 (Oct. 12, 2006) uses a combined order 0-1-2, PPM and LZ model.
fpaq0s5 (Oct. 15, 2006) improves on fpaq0s4. Memory usage is 200 MB when
run at normal priority and 160 MB when run at below normal priority (WinXP Home).
fpaq2 (Oct. 21, 2006) uses a combination context mixing and PPM algorithm.
fpaq0s6 (Oct. 30, 2006) improves on fpaq0s5.
fastari (Nov. 7, 2006) is an order 2 compressor with an all new arithmetic coder
and greater speed.
fpaq3 (Nov. 20, 2006) is an order 3 compressor.
fpaq3b (Dec. 2, 2006) is a bitwise order 28 compressor.
fpaq3c (Dec. 21, 2006) is an improved bitwise order 28 compressor.
fpaq3d (Dec. 28, 2006) adds an option to fpaq3c to select memory
usage from 16 MB to 2 GB. Option 6 selects 1 GB memory (the highest tested).
All programs are here.
TinyCM 0.1 is a free, open source (GPL v3) file compressor by David Werecat,
Oct. 12, 2012. It uses an order 1-2-3-6 context mixing model. It takes one
option, a single digit "level" which apparently has no effect except to store
the value in the first byte of the archive. (I used "9"). Memory is the
same for compression and decompression. The supplied executables require
MSVCR110.dll, which I did not have, so I recompiled the source code
with g++ 4.6.1 using "gcc -O3 -march=native -s *.c -I." on a 2.0 GHz T3200
under 32 bit Vista.
lza 0.10 was released June 29, 2014. It improves compression and speed
and adds compression levels -mx1..-mx5 for higher compression.
A
64 bit version was released July 3, 2014 to support larger memory
options.
lza 0.51 was released Sept. 8, 2014. A
64 bit Windows version was released Sept. 9, 2014. The 64 bit version
allows the hash table option up to -h9 using 4 GB memory. It was tested using
-h8 (2 GB) and -b7 (1 GB buffer). -t1 selects 1 thread (default). -mx5 selects
maximum compression.
lza 0.61 was released Oct. 18, 2014. It is an update to store file dates
and empty directories. The -t option is removed so it is single threaded only.
-h and -b have a documented max value of 7 (1 GB memory each).
lza 0.62
is a bug fix release, Oct. 20, 2014. Additional options -r (recurse directories),
-s (solid mode), -v (verbose) used in testing have no effect on compression.
lza 0.70b
(discussion)
was released Nov. 19, 2014. It uses ANS coding rather than arithmetic coding,
based on the public domain ryg_rans
coder by Fabian Giesen.
ANS extends ABC (asymmetric binary coding) to larger alphabets. ANS coding
theory was developed by Jarek Duda.
Max compression level is increased to -mx9.
LZAwin080test was released Jan. 10, 2015.
lza 0.82b
(discussion) was released Mar. 9, 2015. It is not compatible with
v0.80. The 64 bit version was tested in Wine.
brotli is a free, open
source (Apache license) file compressor by Google. It uses LZ77. It was tested
by compiling from the Sept. 21, 2015 GitHub commit in the tools subdirectory using
the supplied Makefile in Ubuntu Linux with g++ 4.8.4. The -q option
selects the compression level. The default is -q 11.
The test was repeated on the release as of Feb. 18, 2016. -w 24
selects the window size. Default is -w 22.
szip 1.12a is a free, open source
file compressor by Michael Schindler, Mar. 3, 2000. It uses a modified BWT
(a Schindler transform) which sorts using a truncated string comparison to speed
the transform on highly redundant data. The algorithm is protected by
patent 6,199,064 in the U.S.
until Nov. 19, 2017. The first version of szip was released on June 2, 1997.
The option -b41o16 selects a block size of 4.1 MB (the maximum) and order 16, the maximum
length of string comparisons. Memory usage is 17 MB (4x block size) for compression
and 21 MB (5x block size) for decompression. o0 means unbounded order, which is the
same as a normal BWT. The default is -b16o6.
balz 1.02 is a free,
closed source file compressor by Ilia Muraviev, Mar. 8, 2008. It uses LZ77
with arithmetic coding, a 512K buffer with Storer and Symanski parsing.
It takes no options. Memory usage is 346 MB for compression and 18 MB for
decompression.
balz 1.06, May 9, 2008, has two compression
options, e for normal and ex for better but slower compression. Both options use
67 MB for compression and 48 MB for decompression.
balz 1.07
was released May 14, 2008. It uses 132 MB for compression and 95 MB for decompression.
balz 1.08
was released May 20, 2008. It uses 200 MB for compression and 126 MB for decompression.
Only mode ex was tested.
balz 1.09
was released May 21, 2008. It uses 128 MB for decompression. Only mode ex was tested.
balz 1.12
was released June 3, 2008. It uses 123 MB for decompression.
balz 1.13
was released June 11, 2008. It uses 127 MB for decompression.
balz 1.15 was released as open source
on July 8, 2008. It uses 67 MB for compression and 49 MB for decompression.
balz 1.20
(discussion) was released Mar. 5, 2015. It is compatible with 1.15
but faster with less compression.
lzpm 0.02 is a free, closed source
file compressor by Ilia Muraviev, Apr. 19, 2007. It uses LZ77. It takes no options.
lzpm 0.03, Apr. 28, 2007,
uses more memory for compression (181 MB), but still uses 20 MB for decompression.
lzpm 0.04, May 4, 2007,
uses ROLZ. Memory usage is 83 MB for compression and 20 MB for decompression.
The new design uses circular hash chains for better speed on binary files,
but a little slower for text.
lzpm 0.06, May 19, 2007,
improves compression over 0.04 with the same memory usage.
lzpm 0.07, Aug. 6, 2007,
and later versions use 280 MB for compression and 20 MB for decompression.
lzpm 0.08, Aug. 8, 2007.
lzpm 0.09, Aug. 15, 2007.
lzpm 0.10, Aug. 23, 2007.
lzpm
0.11, Sept. 5, 2007,
takes the command 1..9 to choose the compression level (fastest...maximum).
1 uses greedy parsing. 2..8 use 1..7 byte lookahead. 9 uses unbounded lookahead.
All modes use 723 MB for compression and 77 MB for decompression.
lzpmlite 0.11, Sept. 13, 2007,
is a "lite" version of lzpm, using about half as much memory and twice as fast.
Options range from 1..9
with 1 being fastest and 9 for best compression. (3 is a good compromise).
All modes use 362 MB for compression and 39 MB for decompression.
lzpm 0.13 was released
Dec. 1, 2007.
lzpm 0.14 was released
Jan. 1, 2008. It uses 40 MB for decompression.
lzpm 0.15 was released
Jan. 16, 2008. It uses 40 MB for decompression.
The -d9 option selects maximum dictionary size. -x7 selects
maximum hash level (most memory). -l7 selects maximim search level
(slowest).
KuaiZip 2.3.2 is a free GUI archiver
for Windows, Sept. 9, 2011. It uses a proprietary compression algorithm, probably
LZMA. It takes no compression options. On the test machine
(dual core T3200), compression uses 1.5
threads (75% CPU). Decompression uses one thread. Times are reported by the application.
See ppmonstr above.
dzo is a commercial GUI deduplicator and archiver
for Windows by Essenso Labs. A beta version (32 day free trial)
dated Sept. 15, 2011 was tested. The trial version will compress either a single file
or a folder. It first finds duplicate files or regions within files and produces an
intermediate temporary file (file.dp) that removes the duplicates. Then it compresses
the temporary file using LZMA (7zip) to file.dzo and removes it.
The original files are not removed. Decompression restores a single file
to (dzo)file or folder(dzo), again through a temporary .dp file. Both
commands are activated by right-clicking on the file or folder to compress or the .dzo
file to decompress and selecting the command from the context menu. Times are as reported
by the appliation. LZMA compression is multi-threaded.
comprox_ba 20110927
(discussion) is a free,
experimental, open source file compressor by Zhang Li, Sept. 27, 2011. It uses
BWTS
(BWT Scottified) with 4 MB blocks, followed by MTF (move to front), RLEZ (run length encoding
of zeros) and arithmetic coding. BWTS is a bijective variant of BWT developed by David
A. Scott in which the starting index is not stored. In BWTS, the input is factored into
a sequence of lexicographically non-decreasing Lyndon words, which are then context-sorted
separately. The starting indexes for the inverse BWTS are the beginnings of each word.
The program takes no arguments. It uses 103 MB (24x block size) for compression and
25 MB (6x block size) for decompression. There is a Windows and a Linux version.
Only the Windows version was tested.
comprox_ba 20110928 was released Sept. 28, 2011. Compression runs in 2 threads. Both the Windows
and Linux versions were tested (on different computers).
comprox_ba 20110929 was released Sept. 29, 2011. Compression is slightly improved.
Both compression and decompression are now multi-threaded.
turtle 0.01
is a free, experimental, closed source file compressor by
Nania Francesco Antonio, June 1, 2007. It uses PPM. It takes no options.
turtle 0.02
was released June 2, 2007. Compression is identical.
turtle 0.03
was released June 5, 2007. It is faster and improves compression slightly.
The file name is stored in the compressed file.
turtle 0.04
was released June 8, 2007. It recognizes several different file types.
turtle 0.05
was released June 12, 2007. It improves compression at the cost of time and memory.
turtle 0.07
was released June 23, 2007. It includes a model for audio files.
WinTurtle 1.2 is a Windows GUI
version of turtle, released Aug. 16, 2007. It uses PPM with LZP preprocessing.
It detects .tar, .iso, .nrg, .wav, .aiff, .bmp, .exe, .pdf, .log and text files.
Compression times are wall times. Note: the user interface is not fully functional.
To compress a file, click "Drive", click on "Buffer" until it is set to 512 MB (it does not
work until you click "Drive" first, also 1 GB caused program to crash on enwik8),
select "File/compress single file" from the upper menu,
then select the input file and output archive from the two file dialogs.
The program adds a .tur extention to the output archive. To decompress,
select File/open archive, click on the file name, click Select, click Extract,
and select an output folder from the file dialog.
WinTurtle 1.21, Aug. 16, 2007,
fixes an unrelated bug but is otherwise the same as 1.2.
WinTurtle 1.30 was released Aug. 30, 2007.
WinTurtle 1.60 was
released Jan. 1, 2008.
diz
is a free, experimental, open source (GPL) file compressor by
Roger Flores, Aug. 3, 2012. It is a PPMC based compressor written in Python.
It is distributed as source code only.
The program was tested as recommended by running in pypy
version 1.9.
Compression is as follows.
A 20-bit hashed order-4 context is mapped into the last 3 bytes seen
in that context in a move-to-front queue, plus a consecutive hit count.
Queue positions (hits) or literals (misses) are arithmetic coded using
the count and an an order-1 context (order-0 if the count is more than 3)
as secondary context. After a byte is coded, it is moved to the front of the queue.
The hit count is updated as follows: incremented (max 63) if the first byte
is matched, set to 1 if any other byte is matched, or set to 0 in case of a miss.
sr3
(mirror)
is a modification by
Nania Francesco Antonio, Oct. 28, 2007. The context table size is increased
from 4 MB to 64 MB, which effectively increases the context from order-4 to
order-5. This helps compression on larger files, but makes it worse for some
smaller files. The program also detects file type. For .bmp files, the order is
decreased. For .wav files, the input is split into separate 1 byte wide streams
for each audio sample. There is no separate compressor and decompresser program.
sr3.exe was recompiled on July 23, 2009
without upack to remove antivirus false alarms, resulting
in a larger executable. The new size is shown using source code.
bzip2 1.0.2 is an open source command line
single file compressor by Julian Seward, released Dec. 30, 2001.
It uses BWT. The -9 option selects maximum compression.
bzip2 1.0.3
(May 22, 2005) compresses very slightly larger but is faster, as shown by
the following table. The decompresser
size is based on zipped bunzip2.exe. This is smaller than the source
(724,919 bytes as a zip download).
RH is a free, experimental
file compressor by Nauful, Feb. 17, 2014. There are two versions,
RH and RH2. RH uses order 3 ROLZ and Huffman coding, using 8 MB
memory. RH2 has 3 compression levels using 64 MB memory. Level c1
uses LZP. c2 uses order 1 ROLZ with limited search. c3 uses full search.
A literal is coded with 1 bit plus the value. A match is coded with
1 bit to signal a match, 8 bits for the length, and 12 bits for the
index into the ROLZ table.
The 32 and 64 bit Windows .exe versions produce
incompatible archives. The 32 bit version was tested in Windows.
The 64 bit version was tested in Ubuntu under Wine 1.6.
RH2 20Feb2014, released Feb. 27, 2014, has 5 compression levels
c1..c5.
RH4_x64, Mar. 22, 2014 is an archiver with file-level deduplication
and compression improvements. It has 6 compression levels. There are
several earlier versions without version numbers that were not tested.
RH5 was released Nov. 11, 2014. The 64 bit Windows version was tested in
Ubuntu/Wine. It has options c1..c6 to select the compression level
(default c2), default -window:23 to select 2^23 byte window size. Larger
windows compress better with more memory up to 27, but above that has
no effect. Options -hash:13 and -table:12 select the default hash
table sizes and index table sizes. Higher or lower values compress worse.
-skip-checksums is not used because it has no effect on compression.
However it skips a check for duplicate files when creating an archive
from a directory. It would make compression worse in that case.
RangeCoderC v1.3,
Nov. 25, 2011, has 3 versions. The standard version is compatible with v1.2 but
uses half as much memory. The "double" version uses a main model to select
among several sub-models to improve compression at a cost in speed and memory.
There is also an "indirect" version that was not tested because there was no
32 bit Windows version.
RangeCoderC v1.4 was released Nov. 28, 2011. It has 4 versions: standard,
double, indirect, and a new version, hashed, which computes a hashed context
and gives the best compression.
RangeCoderC v1.5 was released Nov. 29, 2011. It combines the 4
models from v1.4 into one program and includes the model type in the archive header.
Option c3 selects the hashed model. It gives the same size as v1.4. The other
models were not tested.
RangeCoderC v1.6 was released Dec. 1, 2011. It has 6 compression modes
selected by options c0 through c5 as follows:
RangeCoderC v1.7 alpha, Dec. 5, 2011, fixes the bug in c1 mode in v1.6.
The other 5 modes are presumably the same and were not tested.
It is a pre-release of version 1.7, released without source code.
RangeCoderC v1.7, Dec. 9, 2011, adds two new compression modes:
RangeCoderC v1.8, Dec. 13, 2011, removes two obsolete modes and adds
one mode: "The Bitwise Adaptive Model uses probabilities instead of counts,
which are adjusted nonlinearly for better compression on changing data.
The learning speed of the model is derived from the model order." The
modes are:
quad is a free file compressor by
Ilia Muraviev. Only the latest version (now open source) is supported, so only that version
appears in the main table.
As described by the author:
QUAD uses ROLZ compression (Reduced Offset LZ). It makes use of an order-2 context to
reduce the offset set that is matched to. This can be regarded as a fast large
dictionary LZ. Literals and Match Lengths fits in a single alphabet which is coded
using an order-2-0 PPM with Full Exclusion. Match indexes are coded using an order-0
model. QUAD uses a 16 MB dictionary. For selectable compression speed and ratio, QUAD
uses different parsing schemes: with Normal mode (Default) QUAD uses a Lazy Matching;
with Max mode (-x option) QUAD uses a variant of Flexible Parsing. In addition, QUAD
has an E8/E9 transformer for better executable compression which is always enabled.
quad 1.01a (Dec. 24, 2006) used LZ77. It was closed source and took no options.
quad 1.04a (Feb. 8, 2007) used LZP. Memory was expanded for this version
only, however it is no longer supported.
quad 1.07beta (Feb. 22, 2007)
included the "x" option for better compression.
quad 1.08 was released Mar. 12, 2007. Quad became open source.
quad 1.10 was released Mar. 19, 2007. -x selects maximum compression.
quad 1.11 (Apr. 4, 2007) uses ROLZ.
quad 1.11HASH2
(Apr. 5, 2007, experimental, no source code) produces the same size archives, but uses
a hash table for faster compression.
quad 1.12 was released Apr. 7, 2007.
WinACE 2.61 is a shareware GUI/command line archiver,
Mar. 8, 2006. It compresses in ACE and ZIP formats and decompresses
many others. ACE decompresses much faster than it compresses,
suggesting it is based on LZ77. The option -m5 selects maximum compression.
-d4096 select maximum dictionary size of 4MB (default is -1024 = 1MB).
-sfx creates a self extracting archive, which adds less space than the
program itself.
lzsr 0.01 is a free file compressor
for Windows by Nania Francesco Antonio, Oct. 1, 2011.
It is described as using a "fusion of LZ77-LZP and SR"
and arithmetic coding. It takes no options.
zling
(discussion) was
updated Dec. 25, 2013. It was tested in Ubuntu with gcc 4.8.1 and
Boost_1_55_0 using the supplied Makefile.
zling 20140121
(discussion),
Jan. 21, 2014,
has some optimizations, and removes Boost. It was tested by compiling
with g++ 4.8.1 -O3 in Windows and with the supplied Makefile in Linux.
libzling 20140219, Feb. 19, 2014,
separates the program into compression API and
a simple demo program. It was tested by building the demo using cmake
under Linux as recommended in the readme file.
libzling 20140324 was released Mar. 24, 2014. The demo
program has 5 compression levels.
libzling 20140414 was released Apr. 14, 2014. It is
faster with better compression.
libzling 20140430-bugfix
(discussion) was released May 4, 2014.
libzling 20160107 was released Jan. 5, 2016 and updated Jan. 7, 2016.
xpv5
is a free Windows command line file compressor
by Abhilash Anand, Oct. 20, 2011. It is described as using
ROLZ with an order 1 back end. It has 3 compression levels:
c0, c1, c2. All levels use 9 MB memory for compression or
decompression. It is single threaded.
lzc 0.03 was released May 11, 2007.
lzc 0.04 was released May 16, 2007. All versions up to 0.04
use 107 MB memory for decompression.
lzc 0.05b was released May 26, 2007. It has options from 1 (fastest)
to 16 (best compression). It uses 771 MB to compress and 390 MB to decompress.
All versions through 0.05b are linked in the above archive.
lzc 0.06b was released Aug. 27, 2007.
It uses 790 MB (peak) for compression and 409 MB (peak) for decompression.
lzc 0.07 was released Oct. 24, 2007.
Options range from 1 (fastest) to 10 (slowest).
lzc 0.08 was released Nov. 15, 2007.
It improves BMP and WAV compression.
Nakamichi 2019-Jul-01 is a free, open source file compressor by Georgi Marinov,
July 1, 2019. It uses LZSS. On the test machine it takes 95 days and 302 GB
of memory to compress and 1.3 seconds and 2 GB to decompress (memory to memory).
Source code
(public domain) was released on June 26, 2013. The file format consists of
64 MiB blocks with a 4 byte header in machine dependent (LSB first for x86) order giving
the block size. Literal and match codes are packed LSB first and padded with trailing
0 bits in the last byte. Codes are as follows:
The compressor maintains an index for finding matches consisting of two
hash tables of size 221 for strings of length 3 and 224
for strings of length 4. The second table is maintained as a linked list.
The two rolling context hashes are computed by shifting the current hash 7 or 6
bits left, respectively, adding the next byte, and chopping off the high bits.
It tests the length 3 hash first, then follows the linked list of length 4 hashes
to find the best match
for up to 4, 256, or 4096 locations in the input buffer for compression options
cf, c, and cx respectively. In addition for option cx, the compressor looks ahead
one byte and codes the current byte as a literal if starting at the next byte
produces a better match. A match is better if it is longer with a penalty of
log16 offset plus one for the literal in case of looking ahead.
The minimum match length is 3 for offsets less than 64 KiB, otherwise 4.
To save memory, only the last 220
linked list pointers are saved in a rotating queue.
As a speed optimization for testing matches, the first and last byte
at the current best match length are tested first, then the rest of the string.
crush 1.00
(discussion)
was released July 1, 2013. It increases the window size from
220 to 221, thus increasing the minimum and maximum
length of an offset code by 1 bit, i.e. if L is 0 the P is 6 bits (1..64)
and if L is in 1..15 then P is L + 5 bits (65..221). Also, the
penalty for coding a match offset is changed to log8(offset/16).
xeloz 0.3.5.3 is a free, open source (MIT license) file compressor
by xezz, Sept. 7, 2014. It uses LZ77 with the following possible code lengths:
xeloz 0.3.5.3a, Sept. 12, 2014,
fixed a bug that caused version 0.3.5.3 to crash when decompressing files
compressed with uppercase option C. The option selects a fixed rather than a
sliding window for faster compression.
lzwhc is a single program that both compresses and decompresses. It was released
Aug. 9, 2023. It uses a hash table for better speed but produces the same output as lzwg.
The option -c28 selects 2^28 hash table entries of 9 bytes each for 2.4 GB.
ulz 0.01
(discussion)
is a free, experimental file compressor by Ilia Muraviev, Feb. 1, 2010. It uses
LZ77 with bytewise encoding. The options c1 through c5 select the compression
level from fastest to best. The option does not affect
memory usage. All levels use 43 MB for compression and 33 MB for decompression.
ulz 0.02
adds a new faster mode (c1). Options c2 through c6 are the same as c1 through c5 in ulz 0.01.
ulz 0.03 was released June 26, 2016.
It is byte aligned LZ77 similar to LZ4 but with 16 MB blocks and 256 KB window.
It has 3 compression levels: cf, c, cu (fast, normal, ultra). Level cu
uses optimal parsing.
ulz 0.06 was released July 13, 2017. It has 9 compression levels, c1 to c9.
Only source code is available. For this test, the program irolz.cpp was
compiled using g++ 4.5.0 on a 2 GHz T3200 under 32 bit Vista with
options -O2 -march=pentiumpro -fomit-frame-pointer -s.
lcssr 0.2 (Dec. 3, 2007, same website)
(mirror with .exe)
is derived from symbra. It drops the secondary symbol queue
and instead uses a variable length context based on the length of the
longest match as with LZ77/LZP. The option -b7 selects a 1152 MB buffer
for finding context matches.
zlite is an open source file compressor by
Zhang Li, Aug. 20, 2013. It uses ROLZ. It was released as C source code only. To test, it
was compiled with MinGW gcc 4.8.0. with option -O3. zlite takes no
options.
The LZ77 format codes literals uncompressed after a length code.
Matches can have an offset in the range 1 to 224-1 and length
4 to 224-1. Literals are coded as 00,N,L[N], where N is the number
of literals to follow coded in marked binary. A marked binary number
discards the leading 1, then precedes each bit by a 1 and marks the
end with a 0 bit. For example, 5=101 would be coded as 1,0,1,1,0.
Matches are coded as 5 bits to indicate the number of offset bits
(where the first 2 bits are not 00) in the range 0..23, then the match
length as a marked binary number except for the last 2 bits, then the
low 2 bits of the match length are coded directly,
and then 0 to 23 bits of the offset without the leading 1 bit.
Compression is achieved in a 16 MB sliding window implemented as a
pair of buffers. A hash table of 219 buckets of 2level (2..32) pointers each,
indexed by an order 4 context hash, maintains pointers for finding matches.
The longest match of length at least 4 is coded, except that if the offset
is over 64K and the last symbol is a match, then the minimum length is 5.
zhuff 0.7 zhuff 0.8
(discussion)
has 3 compression levels, from -c0 (fastest) to -c2 (best). All are multithreaded, but
decompression at all levels and compression with -c0 is I/O bounded (about 40 seconds).
Times are process times for these cases, and real times for -c1 and -c2 compression.
zhuff 0.95b was
released Jan. 27, 2014. zhuff 0.97 beta was released Feb. 2, 2014. Both
programs were tested using the 64 bit Windows version under Ubuntu Wine.
There are also 32 bit Windows versions that produces identical compressed files.
lzhhf is a free, experimental compressor
by Gerald R. Tamayo, Sept. 2, 2022.
LZ77 algorithm (essentially lzuf62 plus Golomb coding the match_len and then the
literals are dynamic Huffman coded.
Compression memory used ~13.75MB. 12MB hash table plus 1 MB sliding
window plus 0.5MB look-ahead pattern buffer.
slug 1.27,
May 7, 2007, uses a ROLZ variant with a 8MB non-sliding window and semi-dynamic
Huffman coding trees rebuilt every 4KB (more frequently near the beginning of a file).
lzuf is a free, experimental open source
file compressor by Gerald R. Tamayo, Apr. 15, 2009. It uses LZ77 with folded unary encoding
of match lengths. It takes no arguments. It has a separate decompression program, lzufd.exe.
lzuf62 is an update, Sept. 2, 2022.
Improved lzuf with optional "sliding window" history buffer bit sizes (WBITS = 12..20).
Compression memory used ~13.75MB. 12MB hash table plus 1 MB sliding window plus 0.5MB
look-ahead pattern buffer.
pigz 2.2.3 is a free command-line file compressor for Linux,
Jan. 15, 2012. It uses the deflate (LZ77) format for compatibility with gzip, but is multi-threaded
for better speed at a small cost in compression ratio. -9 selects best compression.
Decompression is single-threaded and I/O bound.
pigz is distributed as source code only. It requires linking with zlib
version 1.2.3 or higher. For this test, pigz was compiled using the supplied Makefile under Ubuntu
Linux with g++ 4.6.1 and linked to zlib 1.2.5. Decompression was tested with unpigz, compiled
similarly. It was tested on a 2.66 GHz Core i7 M620 (2 cores x 2 hyperthreads per core) as in note 48. Virtual
memory usage was measured with top at 115 MB for compression and 33 MB for decompression. Resident memory
usage was 2 MB. Compression time is real time at about 350% CPU usage. Decompression is I/O bound
(less than 100% CPU), so CPU time is reported. gzip is shown for comparison.
pigz 2.3, Mar. 4, 2013, adds option -11 implelemting Google's
zopfli algorithm, a very highly
optimized and slow implementation of deflate. Decompression speed is not affected
and is compatible with gzip. The test program was built from source code in Ubuntu
using the supplied Makefile with g++ 4.6.3.
.1857 bwtsdc
.1859 fbc
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
fbc v1.0 250000000 22,554,133 188,976,445 21,244 x 188,997,689 541 480 1225 BWT 26
fbc v1.1 333333334 22,554,133 185,975,548 23,576 x 185,999,124 451 415 1647 BWT 55
.1862 ppmvc
.1869 chile
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
chile 0.3d-1 -b40000 23,408,335 203,451,387 11,298 s 203,462,685 4957 435 785 BWT
chile 0.4 -b=244141 22,218,917 186,979,614 11,530 s 186,991,144 2513 512 1426 BWT
.1901 bwtdisk
bwtdisk 0.9.0, is a free, experimental,
open source (GPL v3) file compressor by Giovanni Manzini, July 7, 2010.
It uses BWT. Its purpose is to test the techniques for low memory BWT
described in the paper
Lightweight Data Indexing and Compression in External Memory by
Ferrangina, Gagie, and Manzini, Proc. LATIN 2010. The forward BWT computes the suffix
array in small segments, then makes multiple passes over the BWT output to merge
the result. The external disk usage can be further reduced by compressing the input first
with zlib or lzma and decompressing the input on each pass.
The program is single threaded.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
bwtdisk 0.9.0 -b 1 -m 3500 27,173,252 245 234 500 BWT 48
-b 1 -m 3500 27,173,252 214,137,751 169,579 s 214,342,831 1124 3500 BWT 48
-b 2 -m 3500 24,725,277 186 255 500 BWT 48
-b 2 -m 3500 24,725,277 190,004,306 169,579 s 190,173,885 1124 3500 BWT 48
-b 4 -m 3500 26,975,980 270 247 500 BWT 48
.1910 CTXf
.1912 M03exp
Block size enwik8 Comp Decomp (ns/byte approx)
---------- ---------- ---- ------
8 MB 23,461,984 3860 1840
32 MB 21,948,192 4800 2100
.1930 Stuffit
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- -----
Stuffit 9.0.0.21 Method 4 (best text) 24,310,583 210,801,103 1,015,808 x 211,816,911 542 503 36 12
Method 6 (auto-pick best) 24,419,299 212,392,465 1,015,808 x 213,408,273 2149 68 12
Stuffit 12.0.0.17 -m=1 -l=16 -x=30 25,926,107 2540 420 298 LZ77
-m=2 -l=16 -x=27 24,874,987 3080 90 881 LZ77
-m=8 -l=16 -x=30 25,574,676 560 230 229 BWT
-m=4 -l=16 -x=28 23,482,855 730 694 274 PPM
-m=4 -l=16 -x=29 22,744,155 770 720 537 PPM
-m=4 -l=16 -x=30 22,105,654 190,372,707 2,658,122 xd 193,030,829 628 658 1062 PPM
Stuffit 13.0.0.19 -m=4 -l=16 -x=30 22,105,658 190,372,711 21,611,401 x 211,984,112 567 604 1060 PPM 26
.1933 plzma
Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp CMem Dmem Alg Notes
------- --------------- ---------- ----------- ----------- ----------- ----- ----- --- ---- --- -----
plzma_v3b c2 1000000000 999999999 273 8 0 0 6000 1 1 1 7 24,206,571 193,240,160 101,221 x 193,341,381 8889 55 10110 975 LZMA 58
c2 24,778,033 2110 167 394 54 LZMA 26
plzma_v3c e 25,182,314 2050 39 394 54 LZMA 26
c 24,866,192 2060 164 394 54 LZMA 26
c2 24,778,037 213,154,428 55,974 x 213,210,402 2086 149 394 54 LZMA 26
.1933 crook
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ----
crook v0.1 -m1600 -O4 25,693,515 229,770,948 379 393 781 26
-m1600 -O5 23,664,987 207,093,726 423 442 1641 26
-m1600 -O6 22,793,009 197,202,156 446 475 1641 26
-m1600 -O7 22,505,951 193,896,089 462 496 1641 26
-m1600 -O8 22,503,627 193,333,159 8,539 s 193,341,698 483 513 1641 26
-m1600 -O9 22,620,471 193,912,162 479 519 1641 26
-m1600 -O10 22,752,285 194,794,021 488 511 1641 26
-m1600 -O12 22,957,581 196,397,188 492 505 1641 26
-m1600 -O16 23,105,056 197,631,364 477 503 1641 26
.1936 ppmx
Compressed size Decompresser Total size Time (ns/byte)
Program enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
ppmx 0.01 24,369,312 213,206,926 51,454 x 213,258,380 557 515 550 PPM 26
ppmx 0.02 22,580,291 194,298,469 53,511 x 194,351,980 874 888 609 PPM 26
ppmxcore2duo 0.02 22,580,291 55,824 x 871 949 609 PPM 26
ppmx 0.03 22,572,808 193,643,464 54,964 x 193,698,428 777 784 609 PPM 26
ppmx 0.04 23,150,510 201,384,355 52,406 x 201,436,761 791 801 280 PPM 26
ppmx 0.05 22,905,422 196,548,444 53,476 x 196,601,920 863 882 576 PPM 26
ppmx 0.06 26,131,726 235,257,572 54,596 x 235,312,168 276 317 71 PPM 26
ppmx 0.07 23,941,730 211,671,802 44,104 x 211,715,906 314 352 302 PPM 26
ppmx 0.08 23,204,040 202,868,559 54,098 x 202,922,657 397 420 355 PPM 26
23,204,040 202,868,559 54,098 x 202,922,657 107 127 355 PPM 53
ppmx 0.09 25,952,954 232,581,333 50,873 x 232,632,206 122 150 279 PPM 48
25,952,954 232,581,333 50,873 x 232,632,206 57 69 279 PPM 63
.1947 lzturbo
lzturbo 0.01 is
a free, experimental, closed source file compressor by Hamid Bouzidi, Aug. 15, 2007.
There is some controversy over the origin of the source code.
Discussion.
Discussion.
Prog Opt enwik8 enwik9 prog Total Comp Deco Mem Alg Note
------------ --- ---------- ----------- ------ ----------- ---- ---- --- ---- ----
lzturbo 0.01 -49 26,678,709 233,322,999 68,561 x 233,391,560 1412 50 654 LZ77
lzturbo 0.1 -59 26,616,816 232,708,136 129,344 x 232,837,480 1385 49 248 LZ77
lzturbo 0.9 -59 26,616,278 232,701,587 116,508 x 232,818,095 1420 52 248 LZ77
lzturbo 0.94 -59 -b100 -p0 24,763,542 217,342,694 152,254 x 217,494,948 5196 20 1450 LZ77 26
-10 51,426,368 10 8 78 LZ77 26
-14 38,325,178 74 10 171 LZ77 26
-39 -b50 26,123,933 1290 16 1450 LZ77 26
-41 36,615,397 325,577,604 152,254 x 325,729,858 29 23 203 LZ77 26
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ----
lzturbo 1.1 -10 -b24 53,199,932 3 2 48
-10 -b1000 53,194,540 6 2 48
-11 -b24 47,619,485 6 2 48
-11 -b1000 47,611,974 12 2 48
-12 -b24 44,421,925 18 2 48
-12 -b1000 44,413,087 36 2 48
-19 -b24 41,929,879 493 2 48
-19 -b1000 41,920,122 1610 2 48
-20 -b24 49,736,192 3 2 48
-20 -b1000 49,725,239 6 3 48
-21 -b24 42,628,330 6 2 48
-21 -b1000 42,538,087 12 3 48
-22 -b24 39,541,490 18 2 48
-22 -b1000 39,210,560 35 3 48
-29 -b24 32,919,788 543 2 48
-29 -b1000 31,370,930 1760 4 48
-30 -b24 39,036,288 5 3 48
-30 -b1000 39,023,229 10 6 48
-31 -b24 35,632,652 7 3 48
-31 -b1000 35,572,973 13 6 48
-32 -b24 31,266,016 18 3 48
-32 -b1000 30,753,365 38 6 48
-39 -b24 26,892,107 573 3 48
-39 -b1000 25,298,784 1838 6 48
-49 -b24 25,870,196 225,397,956 110,565 x 225,508,521 792 13 1702 48
-p1 -49 -b200 24,416,777 207,335,845 110,565 x 207,446,410 2566 17 3200 48
-49 -b1000 24,416,777 2110 20 48
-p0 -49 -b1000 24,416,777 194,681,713 110,670 x 194,792,383 1920 9 14700 59
lzturbo 1.2 -10 -b24 52,703,759 3.2 1.6 48
-10 -b1000 52,698,226 7.0 2.2 48
-11 -b24 47,619,370 6.0 1.5 48
-11 -b1000 47,611,859 11 1.9 48
-12 -b24 44,421,812 17 1.4 48
-12 -b1000 44,412,974 31 1.8 48
-19 -b24 41,933,864 515 1.4 48
-19 -b1000 41,924,186 1577 1.9 48
-20 -b24 48,387,089 3.2 1.8 48
-20 -b1000 48,374,729 6.8 5.8 48
-21 -b24 42,628,216 5.9 1.7 48
-21 -b1000 42,537,971 11 2.9 48
-22 -b24 39,394,820 18 2.0 48
-22 -b1000 39,022,094 30 7.1 48
-29 -b24 32,922,201 545 2.4 48
-29 -b1000 31,372,980 1755 4.7 48
-30 -b24 39,147,401 5.3 2.5 48
-30 -b1000 39,138,118 11 5.1 48
-31 -b24 35,618,016 7.3 2.3 48
-31 -b1000 35,563,249 17 4.2 48
-32 -b24 30,979,376 19 2.7 48
-32 -b1000 30,258,461 41 5.3 48
-39 -b24 26,915,461 582 2.8 48
-39 -b1000 25,330,833 1873 5.1 48
-49 -b24 25,812,200 656 8 48
-p1 -49 -b200 24,416,777 206,359,193 125,174 x 206,484,367 2319 14 3200 48
.1956 enc
enc 0.15 is an experimental,
closed source command line archiver by Serge Osnach, Feb. 14, 2003. It uses PPM and CM (in PaQ mode).
It tries up to 5 different compression
methods (depending on options) and chooses the best one. The methods are ("a" means "add to archive"):
Methods ae and ab with options -o8 -d256 were found to give the best compression on enwik7 (first 107
bytes). These methods discard the model when the memory limit is reached, and this was observed to happen
(in task manager), so these options should hold for larger files. However with -d127 (necessary to decompress),
method aq gives the best compression.
.1966 comprolz
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg Note
--------- --- --------- ----------- ------- ----------- ---- ---- --- ---- ----
comprolz 0.1.0 -b256 24,835,082 215,770,703 41,170 s 215,811,873 595 262 602 ROLZ 26
comprolz 0.2.0 -b250 -f 24,280,609 210,255,761 43,899 s 210,299,660 1415 319 666 ROLZ 26
comprolz 0.10.0 -b250 -f 23,050,103 198,635,448 82,824 x 198,718,272 1086 333 595 ROLZ 26
comprolz 0.11.0 -b250 -f 23,687,477 213,585,466 29,509 x 213,614,975 1608 324 866 ROLZ 26
comprolz 0.11.0b1 -b250 -f 22,813,215 196,651,379 29,453 x 196,680,832 984 308 688 ROLZ 26
.1971 sbc
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp
------- ------- ---------- ----------- ----------- ----------- ----- -----
sbc 0.970r2 -ad -m3 -b63 22,470,539 197,066,203 99,094 xd 197,165,297 1733 313
sbc 0.970r2 -ad -m1 -b31 23,288,217 99,094 xd 620 230
sbc 0.970r2 -ad -m1 -b1 27,087,118 99,094 xd 300 180
.1973 xz
Version Options enwik8 enwik9 size (zip) enwik9+prog Ctime Dtime Cmem Dmem Note
-------- ------- ---------- ----------- --------- ----------- ----- ---- ---- ---- ----
xz 5.0.1 -9 -e 24,831,648 2310 40 690 66 26
-9 24,865,244 2600 40 690 66 26
26,375,764 2020 45 95 8 26
xz 5.2.1 --lzma2=preset=9e,dict=1GiB,lc=4,pb=0 24,703,772 197,331,816 36,752 xd 197,368,568 5876 20 6000 1025 73
.1984 WinRAR
WinRAR 3.60 beta 3 is a commercial (free trial)
Windows GUI and command line archiver by Eugene Roshal, May 8, 2006.
It produces rar and zip archives
and decompresses many other formats. It also encrypts and performs other functions.
The best compression mode uses PPM (actually ppmd var. I, an earlier version of ppmd J)
with optimizations for text and other
formats (exe, wav, bmp). The -mc7:128t+ option says to use PPM order 7,
128 MB memory (maximum) and force text preprocessing. The -sfxWinCon.sfx
option says to produce a self extracting console executable
(adding 79,360 bytes).
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- -------------------------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- ----
WinRAR 3.60b3 -mc7:128t+ -sfxWinCon.sfx 22,713,569 198,454,545 0 xd 198,454,545 506 415
-mc10:128t+ -sfxWinCon.sfx 23,233,523 0 xd 770
-m5 -sfxWinCon.sfx 24,832,649 0 xd 680 520
-sfxWinCon.sfx 29,828,890 0 xd 780 40
29,749,530 98,888 xd 780 40
WinRAR 4.20 -m1 40,234,511 36 32 99 LZ77 26
-m2 30,564,700 180 29 99 LZ77 26
-m3 (default) 29,671,175 325 30 99 LZ77 26
-m4 29,329,237 484 30 99 LZ77 26
-m5 29,225,016 590 30 99 LZ77 26
-mc5:128t+ 23,440,773 358 229 PPM 26
-mc6:128t+ 22,701,033 418 229 PPM 26
-mc7:128t+ 22,635,718 198,372,701 141,019 xd 198,513,720 440 373 229 PPM 26
-mc8:128t+ 22,769,557 518 456 229 PPM 26
-mc10:128t+ 23,153,065 582 229 PPM 26
-mc12:128t+ 23,401,290 609 229 PPM 26
WinRAR 5.00b2 -mc7:128t+ 22,635,718 198,372,701 153,763 x 198,526,464 433 368 226 PPM 26
-ma5 -m1 40,565,268 54 31 406 LZ77 26
-ma5 -m2 29,758,785 228 30 435 LZ77 26
-ma5 -m3 28,662,794 439 32 435 LZ77 26
-ma5 -m4 28,072,832 751 31 435 LZ77 26
-ma5 -m5 27,835,431 1004 31 435 LZ77 26
.1986 quark
quark v0.95r beta is a free,
closed source command line file compressor by Frederic Bautista, Mar. 10, 2006.
It uses LZ. It is characterized by high compression and fast decompression.
The -m1 option selects relative mode compression, which is normally best, but slowest. The
-d25 option selects a dictionary size of 225 which is the largest that will
run without thrashing with 1 GB RAM. The -l8 option selects the search depth.
Higher values normally improve compression (up to -l13, default -l4), but -l8 was the highest
practical value for reasonable compression speed (7.5 hours). Also, larger values were
found to hurt compression on enwik5.
Compression time increases approximately exponentially with the -l value.
The compression speed with -l13 is 6,100,000 ns/byte.
.1994 lzip
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg Note
--------- --- --------- ----------- ------- ----------- ---- ---- --- ---- ----
plzip -9 25,578,352 221,845,216 56,614 x 221,901,830 1308 37 1028 LZ77 26
lzip 1.14-rc3 -9 -s512MiB 24,756,063 199,410,543 21,682 s 199,432,225 2409 21 5632 LZ77 57
plzip 1.5 -9 25,518,871 221,179,984 336,294 x 221,213,608 425 13 2048 LZ77 48
.1995 comprox
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Version Opt enwik8 enwik9 size (zip) enwik9+prog Comp Decomp CMem Dmem Alg Note
------- -------- --- ---------- ----------- ----------- ----------- ----- ----- ---- ---- --- ----
comprox_sa 20110927 (Win32) 32,654,393 287,588,097 3,791 s 287,591,888 398 101 60 6 LZSS 26
comprox_sa 20110928 (Win32) 32,654,718 287,590,343 3,790 s 287,594,133 205 101 122 10 LZSS 26
comprox_sa 20110928 (Linux) 32,654,718 287,590,343 3,790 s 287,594,133 126 59 141 10 LZSS 48
comprox_sa 20110929 (Win32) 32,652,597 287,575,768 3,774 s 287,579,542 209 71 122 12 LZSS 48
comprox_sa 20110929 (Linux) 32,652,597 287,575,768 3,774 s 287,579,542 116 37 145 36 LZSS 48
comprox 0.1.1 (Win32) 0 29,463,135 146 65 219 44 LZ77 26
(Win32) 5 28,836,139 290 65 218 43 LZ77 26
(Win32) 9 28,586,545 250,565,797 5,430 s 250,571,227 768 57 218 43 LZ77 26
(Linux) 9 28,586,545 250,565,797 5,430 s 250,571,227 496 29 218 44 LZ77 48
comprox 0.6.0 (Win32) e200 25,504,328 221,405,873 23,367 s 221,429,240 484 92 1567 590 LZ77 26
e16 26,816,904 395 132 169 68 LZ77 26
comprox 0.7.0 (Linux) e200 25,068,368 217,403,007 36,702 s 217,439,709 225 52 1000 410 LZ77 48
(Win32) e200 25,068,368 217,403,007 36,702 s 217,439,709 390 126 1107 472 LZ77 26
(Linux) e500 25,068,368 212,824,614 36,702 s 212,861,316 260 57 2500 1100 LZ77 48
(Linux) e700 25,068,368 212,348,904 36,702 s 212,385,606 309 49 3400 1500 LZ77 48
comprox 0.8.0 (Win32) e200 24,537,383 212,651,678 42,764 s 212,694,442 460 128 1143 279 LZ77 26
(Linux) e500 24,537,383 208,328,173 42,764 s 208,370,937 296 49 2500 558 LZ77 48
comprox 0.8.0-bugfix1 (Win) e200 24,537,453 212,652,159 42,804 s 212,694,963 480 145 1108 281 LZ77 26
comprox 0.9.0 (Win32) -b250 -f -m100 24,243,078 208,369,181 46,387 s 208,415,568 1657 130 1405 326 LZ77 26
-b250 -f 24,281,529 748 161 733 164 LZ77 26
-b250 24,486,987 398 160 733 164 LZ77 26
25,494,243 317 167 151 86 LZ77 26
comprox 0.10.0 (Win32) -b250 -f -m100 23,332,113 201,288,183 86,687 x 201,374,870 1209 151 1271 LZ77 26
comprox 0.11.0 (Win32) -b200 -f -m100 23,990,134 217,340,709 34,176 x 217,374,885 2115 144 1211 LZ77 26
(Win32) 25,003,709 234,265,741 34,176 x 234,299,917 436 145 269 LZ77 26
comprox 0.11.0-bugfix1(Win) -b250 -f -m100 23,064,386 199,515,912 34,176 x 199,550,088 917 153 688 LZ77 26
23,861,257 209,481,309 34,176 x 209,515,485 307 162 196 LZ77 26
.2018 bssc
bssc 0.95a is a free command line file compressor
by Sergeo Sizikov, 2005.
It uses BWT. The -m16383 option selects the maximum block size of 16383 KB (uses 140 MB memory).
.2024 lzham
For example, if the symbols and their frequencies are A=3, B=2, C=1, then the sum (6) is
rounded up to 8 and the individual frequencies are rounded down to A=2, B=2, C=1, which
sums to 5. We then double A=4, which sums to 7. We cannot double B=4 because the sum
would exceed 8, so we continue to C. At this point we have A=4, B=2, C=2, which
sums to 8, and we may assign codes of appropriate lengths such as A=0, B=10, C=11.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
lzham alpha 2 x86 25,907,665 224,554,163 95,922 x 224,650,085 2485 21 609 LZ77 26
lzham alpha 3 x86 -m4 -d26 -t2 24,991,681 213,868,601 139,694 x 214,008,295 2970 22 611 LZ77 26
lzham alpha 3 x64 -m4 -d29 24,954,329 206,393,809 155,282 x 206,549,091 595 9 4800 LZ77 45
lzhamtest v1.0 25,064,179 207,094,787 191,600 s 207,286,387 553 9 2392 LZ77 48
-d26 25,091,033 279 7.3 LZ77 70
-d26 -x 24,990,739 722 7.2 LZ77 70
-d29 204,325,043 191,600 s 204,516,643 339 6.6 LZ77 70
-d29 -x 202,237,199 191,600 s 202,428,799 1096 6.6 LZ77 70
-d29 -x 25,002,070 1761 9.5 911 LZ77 48
.2024 flashzip
flashzip 0.1
is a free, closed source file compressor by Nania Francesco Antonio, Jan. 10, 2008.
It uses LZP and arithmetic coding.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
flashzip 0.1 34,053,198 299,443,551 25,734 x 299,469,285 67 51 47 LZP
flashzip 0.2 34,053,198 299,443,551 25,257 x 299,468,808 62 52 47 LZP
flashzip 0.3 5 28,541,292 248,094,851 26,738 x 248,121,589 297 73 86 ROLZ
x 5 27,845,033 241,997,412 26,738 x 242,024,150 673 72 86 ROLZ
flashzip 0.9 (-m1 -s1 -b3) 31,856,012 141 124 83 ROLZ
-b1 32,088,940 148 125 70 ROLZ
-b5 31,764,213 143 119 132 ROLZ
-s4 29,235,064 269 99 83 ROLZ
-s7 28,370,670 928 87 83 ROLZ
-m2 31,641,305 188 121 83 ROLZ
-m2 -s7 27,665,526 2081 97 83 ROLZ
-m2 -s7 -b5 26,737,801 230,987,395 30,052 x 231,017,447 2476 75 132 ROLZ
flashzip 0.91 -m2 -s7 -b5 26,068,507 227,945,252 34,222 x 227,979,474 3560 112 198 ROLZ
-m1 -s7 -b5 26,851,582 1305 127 198 ROLZ
flashzip 0.93a -m2 -s7 -b5 26,243,745 227,048,196 36,367 x 227,084,563 1458 95 132 ROLZ
-m1 -s7 -b5 27,004,639 1030 140 198 ROLZ 26
flashzip 0.94 -m2 -s7 -b5 26,236,095 226,981,882 35,996 x 227,017,878 2451 87 132 ROLZ 26
-m1 -s7 -b5 26,662,405 230,985,291 35,996 x 231,021,287 1275 84 198 ROLZ 26
flashzip 0.99 -m2 -s7 -b5 26,027,791 224,648,225 37,361 x 224,685,586 2399 110 198 ROLZ 26
-m1 -s7 -b5 26,305,210 1230 160 132 ROLZ 26
flashzip 0.99b4 -m2 -c7 -b8 25,804,706 218,328,751 141,207 x 218,469,958 3037 86 609 ROLZ 26
-m1 -c7 -b8 26,255,893 1580 97 182 ROLZ 26
flazhzip 0.99b8 -m0 -c7 -b8 29,191,973 200 110 214 ROLZ 26
-m1 -c7 -b8 27,752,588 510 110 231 ROLZ 26
-m2 -c7 -b8 26,351,718 1420 110 231 ROLZ 26
-m3 -c7 -b8 26,008,189 220,193,756 119,185 x 220,312,941 3281 84 658 ROLZ 26
-s1 -m3 -c7 -b9 26,008,189 218,405,144 119,185 x 218,524,329 3531 89 1111 ROLZ 26
flashzip 0.99c1 -m3 -c7 -b7 24,840,311 206,005,639 131,128 x 206,136,767 2139 117 1050 ROLZ 26
flashzip 0.99c3 -m3 -c7 -b7 24,840,025 205,992,947 246,816 x 206,239,763 1925 112 1050 ROLZ 26
flashzip 0.99d1 28,022,537 253 92 46 ROLZ 26
-b7 28,088,756 542 102 127 ROLZ 26
-m9 -b7 24,363,049 207,354,714 170,353 x 207,525,067 1180 94 1100 ROLZ 26
flashzip 1.00 26,788,895 168 127 37 ROLZ 26
-b7 26,761,559 174 123 91 ROLZ 26
-m7 -b7 26,761,559 762 130 136 ROLZ 26
-mx7 -b7 23,869,034 202,363,445 123,053 x 202,486,498 1296 122 802 ROLZ 26
-mx7 -e -b7 23,995,498 202,489,909 0 x 202,489,909 1123 123 840 ROLZ 26
flashzip 1.12 -mx3 -k7 -b1024 24,726,693 211,104,283 151,961 x 211,256,255 581 94 1152 ROLZ 26
.2081 uharc
uharc 0.6b is a free (for noncommercial
use) closed source command line archiver by Uwe Herklotz, Oct. 1, 2005.
In maximum compression mode (-mx) it uses PPM. In modes -m1 (fastest) to
-m3 (best) it uses ALZ: LZ77 with arithmetic coding. -mz uses LZP.
-md32768 selects maximum dictionary size (uses 50 MB memory, default is -m4096).
Additional results for enwik8:
Options enwik8 Comp Decomp (ns/byte)
------- ---------- ---- ------
-mx -md32768 23,911,123 1830 1510
-mx 23,952,039 1832 1546
-m3 27,957,245 1840 110
-m2 28,459,084 1726 110
-m1 29,660,279 1242 121
-mz 30,429,795 191 236
.2040 csarc
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---- ----
csc2 34,119,354 298,385,256 9,092 x 298,394,348 141 201 49 LZP 26
csc3 2009.08.12 -m1 -d1 33,920,768 150 59 15 LZ77 26
-m1 -d7 33,510,724 320 59 511 LZ77 26
-m2 -d4 31,627,835 660 56 93 LZ77 26
-m2 -d7 31,460,838 730 55 511 LZ77 26
-m3 -d7 30,430,159 263,485,695 14,027 x 263,499,722 1514 43 675 LZ77 26
csc31 -m3 -d7 28,984,849 250,172,831 64,214 x 250,237,045 1045 33 791 LZ77 26
csc32 a2 -m3 -d9 30,304,020 262,999,383 111,571 x 263,110,954 340 35 528 LZ77 26
csc32 final -m1 -d128 28,973,600 178 49 166 LZ77 26
-m2 -d128 28,624,802 283 52 166 LZ77 26
-m3 -d4 27,776,206 416 52 24 LZ77 26
-m3 -d128 26,842,072 232,326,926 53,665 s 232,380,591 420 46 201 LZ77 26
-m3 -d512 26,842,072 229,929,654 53,665 s 229,983,319 423 47 660 LZ77 26
csarc 3.3 -m1 -p4 -t4 -d256m 29,160,344 250,618,458 69,848 s 250,688,306 32 12 1340 LZ77 48
-m3 -p4 -t4 -d256m 27,130,418 232,020,894 69,848 s 232,090,742 95 12 1581 LZ77 48
-m5 -d1024m 24,516,202 203,995,005 69,848 s 204,064,853 621 22 2463 LZ77 48
.2044 packet
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- ----
packet 0.01 37,637,275 334,473,465 30,508 x 334,503,973 50 43 4 LZP
packet 0.02 37,637,276 334,473,466 27,900 x 334,501,366 58 42 4 LZP
packet 0.03b 1 35,576,495 140 20 3 LZ77
x 1 34,792,199 170 20 3 LZ77
6 34,563,297 450 20 3 LZ77
x 6 33,752,502 297,266,174 26,435 x 297,292,609 594 18 3 LZ77
packet 0.90b -m1 -s0 35,426,140 199 28 10 LZ77
-m1 -s9 32,780,039 2887 26 10 LZ77
-m2 -s0 34,281,503 274 24 10 LZ77
-m2 -s9 31,968,711 4527 25 10 LZ77
-m3 -s0 34,966,621 236 56 10 LZ77
-m3 -s9 32,199,212 2965 51 10 LZ77
-m4 -s0 33,612,046 307 61 10 LZ77
-m4 -s3 32,033,412 861 57 10 LZ77
-m4 -s6 31,367,386 2411 57 10 LZ77
-m4 -s9 31,208,752 273,176,127 32,305 x 273,208,432 3871 48 10 LZ77
packet 0.91b -m6 -s9 31,306,703 274,033,491 45,358 x 274,078,849 3669 36 10 LZ77 26
packet 1.0 -m4 28,349,717 487 37 416 LZ77 26
-m4 -t2 28,789,607 385 53 500 LZ77 26
-m9 27,439,216 4530 37 425 LZ77 26
-mx9 26,895,256 232,428,377 114,566 x 232,542,943 19749 34 429 LZ77 26
packet 1.1 26,848,041 233,803,751 265,102 x 234,068,853 295 26 335 LZ77 48
-m9 -b512 -h4 25,624,659 216,849,389 265,102 x 217,114,491 647 26 1500 LZ77 48
-mx -b512 -h4 25,348,872 213,722,850 265,102 x 213,987,952 767 26 1500 LZ77 48
packetx64 1.2 -mx -b512 -h4 24,664,592 204,646,570 314,885 x 204,961,455 601 21 1619 LZ77 48
packet_x64 1.9 -mx -b512 -h8 24,968,492 204,195,438 261,967 x 204,457,405 974 14 2824 LZ77 48
.2088 TarsaLZP
Compressed size Decompresser Total size Time (ns/byte)
Program enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
TarsaLZP Jul 4 2006 35,745,297 334,661,013 2,255 sd 334,663,268 149 163 54 LZP
TarsaLZP Jul 30 2006 34,321,697 320,160,237 1,455 xd 320,161,692 110 117 54 LZP
TarsaLZP Aug 5 2006 32,270,002 295,312,202 1,579 xd 295,313,781 110 127 70 LZP
TarsaLZP May 6 2007 32,461,606 297,130,840 1,580 xd 297,132,420 97 121 71 LZP
TarsaLZP Jun 17 2007 31,233,381 283,895,945 1,604 xd 283,897,549 100 122 71 LZP
TarsaLZP Jul 18 2007 31,363,533 285,248,058 2,365 xd 285,250,423 88 105 71 LZP
TarsaLZP Jul 30 2007 26,664,933 233,613,937 2,472 xd 233,616,409 247 255 42 LZP
TarsaLZP Aug 8 2007 25,134,862 215,301,412 2,843 xd 215,304,255 249 287 341 LZP
TarsaLZP Aug 10 2007 25,135,357 215,301,079 3,546 xd 215,304,626 269 322 341 LZP
TarsaLZP Jan 29 2012 24,751,389 208,867,187 13,081 s 208,880,268 203 ~2000 LZP 54
TarsaLZP Nov 18 2012 24,860,676 211,990,481 20,303 s 212,010,784 244 277 330 LZP 26
.2090 GRZipII
GRZipII
0.2.4 is a free, open source (LGPL)
command line file compressor by Grebnov Ilya, Feb. 12, 2004. It uses BWT.
The -b8m option selects the maximum block size of 8 MB.
.2091 4x4
4x4
0.2a
is a free, open source file compressor by Bulat Ziganshin,
June 2, 2008. It is a wrapper around GRZipII, tornado, and LZMA (7zip),
and a subset of the FreeARC archiver.
Source code is included in the FreeARC distribution. The program
allows arguments to be passed to each compressor, plus 16 preset
options. Only the fastest and slowest preset option for each compressor
was tested. Options 1-7 are tornado, 8-12 are LZMA, and 1t-4t are GRZipII.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- -------------------------- ---------- ----------- ----------- ----------- ----- ----- --- ---
4x4 0.2a 1 (tor:1:4m) 59,711,544 17 13 54 LZ77
7 (tor:7:64m) 32,433,532 197 24 230 LZ77
8 (lzma:fast:128m:ht4:mc8) 32,698,603 292 43 230 LZ77
12 (lzma:128m:ht4:mc128) 27,307,504 4354 43 230 LZ77
1t (grzip:m4) 26,576,294 167 232 128 BWT
4t (grzip:m1:h18) 23,833,244 208,787,642 317,097 x 209,104,739 386 240 269 BWT
.2101 rzm
rzm 0.06c
(mirror)
is a free file compressor by Christian Martelock, Mar. 4, 2008.
It uses order-1 ROLZ as discussed
here.
It takes no options.
Memory usage is advertised as 258 MB for compression and 130 MB for decompression.
Measured values (shown) are 180 MB for compression and 104 MB for decompression.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
rzm 0.06c 24,429,597 210,719,085 12,903 x 210,731,988 2216 92 180 ROLZ
rzm 0.07h 24,361,070 210,126,103 17,667 x 210,143,770 2336 81 160 ROLZ
.2104 pim
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
pim 2.01 PPMd, no exe, no color 24,303,638 210,124,895 340,951 x 210,465,846 ~600 639 92 PPM
pim 2.04b PPMd 24,303,638 210,124,895 335,004 x 210,459,899 900 780 84 PPM
pim 2.10 PPMd 24,303,638 210,124,895 335,374 x 210,460,269 895 ~900 84 PPM
pim 2.50 best 24,303,638 210,124,895 330,901 x 210,455,796 764 ~764 88 PPM
.2120 CTW
Option enwik7 enwik8 enwik9 Comp (ns/byte)
------ --------- ---------- ----------- -----
-d5 2,490,460 24,174,511 11340
-d6 2,438,708 23,670,293 211,995,206 19221
-d7 2,455,765 23,689,423 24680
-d9 2,494,767
-d12 2,531,284
.2139 boa
.2144 yzx
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg Note
--------- --- --------- ----------- ------- ----------- ---- ---- --- ---- ----
yzx 0.01 -b5 28,984,962 249,903,552 116,793 x 250,020,345 395 73 732 LZ 26
yzx 0.02 -m2 -c8 -b5 27,293,259 229,890,264 116,795 x 230,007,059 10927 67 732 LZ 26
yzx 0.03 -m2 -c5 -b6 28,132,853 241,790,934 116,141 x 241,907,075 911 71 404 LZ 26
yzx 0.04 -m2 -c5 -b6 27,670,096 235,198,449 116,507 x 235,314,956 833 69 535 LZ 26
yzx 0.11 27,694,742 258 85 293 LZ 26
-m9 -b8 -h5 25,768,724 518 81 636 LZ 26
-m9 -b7 -h6 25,754,856 214,317,684 131,062 x 214,448,746 642 77 1590 LZ 26
.2157 zstd
zstd is a free, open
source (BSD) file compressor by Yann Collet, Jan. 25, 2015.
It uses LZ77 and finite state entropy encoding.
It takes no compression options. To test, it
was compiled using the supplied Makefile with gcc 4.8.2 in Linux (note 48)
and "make -CC=gcc" 4.8.1 in Windows (note 26).
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
zstd 40,024,854 354,602,693 91,253 s 354,693,946 7.7 3.8 1.6 LZ77 48
zstd 40,024,854 354,602,694 91,253 s 354,693,947 23.3 13.9 1.1 LZ77 26
zstd 0.4.0 -f20 27,195,437 233,505,508 301,514 s 233,807,022 432 1.6 LZ77 76
zstd 0.4.2 -1 40,799,603 358,186,203 289,735 s 358,475,938 7.1 3.6 2 LZ77 48
-9 31,789,761 278,571,002 289,735 s 278,860,737 79 3.7 11 LZ77 48
-20 27,195,437 233,505,508 289,735 s 233,795,243 699 6.5 721 LZ77 48
-20 27,195,437 233,505,508 289,735 s 233,795,243 423 1.7 722 LZ77 76
zstd NL 0.4.2 -20 27,195,437 233,505,508 59,431 s 233,564,939 423 1.7 722 LZ77 76
zstd 0.5.1 -21 25,571,637 219,432,125 67,144 s 219,499,269 608 1.8 LZ77 76
-21 25,571,637 998 6.5 722 LZ77 48
zstd 0.6.0 -22 236,376,273 69,687 s 236,445,960 473 1.6 LZ77 76
-22 --ultra 25,405,601 215,674,670 69,687 s 215,744,357 701 2.2 792 LZ77 76
.2178 tornado
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---- ----
tornado 0.1 -9 34,491,218 303,034,530 20,336 s 303,054,866 204 25 210 LZ77
tornado 0.3 -1 59,790,826 18 LZ77
-2 44,570,662 22 LZ77
-3 40,173,986 28 LZ77
-4 37,849,654 60 LZ77
-5 34,206,892 81 LZ77
-6 33,319,753 130 LZ77
-7 32,346,652 195 96 LZ77
-8 31,659,225 304 192 LZ77
-9 30,967,871 506 384 LZ77
-10 30,614,648 802 768 LZ77
-11 30,274,896 259,412,590 45,833 s 259,458,423 1646 25 1510 LZ77
-12 30,057,549 3700 28 1768 LZ77
tornado 0.4a -11 30,157,610 258,761,459 42,516 s 258,803,975 783 25 1513 LZ77
-12 30,026,843 3200 29 >1800 LZ77
tornado 0.6 -1 59,790,838 531,349,003 8 5 2 LZ77 48
-2 49,093,116 8 6 3 LZ77 48
-3 39,510,585 14 9 5 LZ77 48
-4 38,018,770 18 9 11 LZ77 48
-5 34,175,257 300,482,758 41 9 25 LZ77 48
34,175,257 300,482,758 93 24 29 LZ77 26
-6 32,921,124 57 10 97 LZ77 48
-7 30,131,376 134 10 229 LZ77 48
-8 29,507,281 290 11 613 LZ77 48
-9 29,327,427 392 11 613 LZ77 48
-10 29,048,467 371 11 628 LZ77 48
-11 30,108,427 270 10 356 LZ77 48
-12 28,596,548 397 9 356 LZ77 48
-13 28,042,448 503 9 484 LZ77 48
-14 27,129,826 672 9 614 LZ77 48
-15 26,762,749 985 10 614 LZ77 48
-16 25,768,105 217,749,028 83,694 s 217,832,722 1482 9 1290 LZ77 48
.2178 LZPXj
LZPXj 1.1d
is an experimental open source (GPL) command line file compressor by
Ilia Muraviev and Jan Ondrus, May 21, 2006. The -m3 option selects maximum compression.
The -e0 option turns off the exe filter (has no effect on text). The -r3 and -a0 options
were tuned experimentally on enwik7. -r sets the rescale rate (range 1-5, default 3).
-a0 turns off the alternate one byte matcher (default -a1 = on).
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- -----
LZPXj 1.1b -s (best, = -r4 in 1.1d) 28,387,611 674 LZP
LZPXj 1.1b (default) 28,440,958 677 LZP
LZPXj 1.1d -m3 -r4 -a0 -e0 28,386,512 246,468,866 6,534 s 246,475,400 362 402 216 LZP
LZPXj 1.2h 9 25,205,783 217,880,584 4,853 s 217,885,437 783 717 1316 PPM
.2179 scmppm
.2185 acb
acb
(discussion)
is a shareware archiver for DOS by George Buyanovsky. It achieved some popularity
in Russia in 1997 after being described in a popular magazine there.
acb uses a complex variant
of LZ77 called "associative coding". (ACB means "associative coding by Buyanovsky").
History is collected in a context sorted ring (like BWT) called a "funnel of
analogies". A string match is coded by the position of the longest (nearest) match in this
data structure. The length is coded dependent on the length of neighboring matches.
The result is arithmetic coded. There are 4 versions:
All versions limit file size to 64 MB but do not limit archive size. To
test enwik8, it was divided into 2 equal parts of 50 MB and compressed into
one archive. Archives are compressed in "solid" mode.
enwik9 was divided into 16 equal parts of 62.5 MB each
(named 01 through 16)
and compressed to 16 separate archives. The compressor crashed (after 12 hours
and producing 1474 MB output in 3 files) with
an illegal interrupt when attempting to compress enwik9 into a single archive.
.2186 crushm
.2190 PX
PX v1.0 is a free command line
file compressor by Ilia Muraviev, Feb. 17, 2006. It is a context mixing
compressor based on PAQ1 with fixed weight models.
.2196 DGCA
DGCA v1.10 is a free, closed source
GUI archiver, Aug. 8, 2006. The installer is in Japanese but the program runs
in several languages including English. It was tested with default settings
except for producting a self extracting archive. This adds 189,936 bytes
to enwik8.
.2200 Squeez
Squeez 5.20.4600 is a commercial
(60 day trial) GUI archiver by SpeedProject, Apr. 11, 2006.
It supports 13 different formats, but only
the native .sqx (possibly LZ77) format was tested. The options used were 2.0 format (newest),
32 MB dictionary (largest, actually uses 365 MB memory), Ultra compression (best),
and all checkboxes off (including no exe or multimedia compression). There is a SFX
option but using UnSqueez to decompress instead gives a smaller size.
.2212 fpaq2
Program Opt enwik8 enwik9 prog (zip) enwik9+prog Comp Decomp Mem Alg
------- --- ---------- ----------- ----------- ----------- ----- ----- --- --
fpaq2 25,287,775 221,242,386 3,429 s 221,245,815 20183 20186 131 CM
fpaq3d 6 26,656,082 233,750,402 3,309 s 233,753,711 1922 1938 1050 o28b
fpaq3c 27,978,995 248,253,886 2,535 s 248,256,421 1446 1456 268 o28b
fpaq0s6 30,012,650 263,438,012 4,150 s 263,442,162 547 505 174 PPM
fpaq0s5 30,374,122 266,244,843 4,027 s 266,248,870 480 419 200 PPM
fpaq3b 29,992,583 270,804,549 2,926 s 270,807,475 1526 1517 256 o28b
fpaq3 31,176,104 282,922,749 8,820 x 282,931,569 1770 1807 250 o3
fpaq0x1b 30,860,828 283,001,299 2,727 s 283,004,026 1178 1180 1094 PPM
fpaq0s4 33,327,611 311,104,858 3,528 s 311,108,386 477 473 147 PPM
fpaq0x1a 36,186,433 339,131,763 2,561 s 339,134,324 621 623 1052 o3
fpaq0s2b 35,934,548 343,603,459 3,029 s 343,606,488 599 605 1052 o3
fastari 39,392,220 371,909,475 2,287 s 371,911,762 224 261 133 o2
fpaq0s2 38,812,873 375,050,952 2,982 s 375,053,934 591 595 131 o2
fpaq0x 38,845,305 375,276,899 2,482 s 375,279,381 631 631 263 o2
fpaq0s3 49,728,923 490,781,136 3,000 s 490,784,136 525 475 32 o2
.2217 TinyCM
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Note
--------- --- --------- ----------- ------- ----------- ---- ---- --- ----
TinyCM 0.1 9 25,913,605 221,773,542 12,553 x 221,786,095 1342 1330 1083 26
.2226 dmc
dmc is the original DMC
compressor written by Gordon V. Cormack in 1987 and described in
"Data Compression using Dynamic Markov Modelling",
by Gordon Cormack and Nigel Horspool in Computer Journal 30:6 (December 1987).
The algorithm is the same as described in hook with the
last 2 arguments fixed at "2 2". The dmc argument "c 1800000000" means to
compress with 1.8 GB memory. The memory size must also be given for decompression.
Thus, 10 bytes (the size of the argument) was added to the decompresser size
(source zipped with Info-Zip 2.31 -9).
Because dmc compresses and decompresses
from stdin to stdout, it was tested in Linux (Ubuntu
2.6.15.27-amd64-generic), compiled in gcc 4.0.3 x86-64 as follows:
gcc -O -s -Dexp=expand dmc.c
and tested on a 2.2 GHz Athlon-64 with 2 GB memory. The compiler argument
"-Dexp=expand" removes a compiler error due to a K&R style redefinition of exp().
.2230 lza
lza 0.01 is a free
archiver for 32 bit Windows by Nania Francesco Antonio, May 29, 2014.
It uses LZ77 (based apparently on zcm).
Option -t selects number of threads. Default is -t1. Using a greater
number of threads makes compression worse by splitting the input
among threads. -h0..-h7 selects
hash buffer memory 8 MB to 1 GB. Default is -h2 (32 MB). -b0..-b7
selects LZ buffer memory 8 MB to 1 GB. Default is -b3 (64 MB).
Option combinations -b6 -h7 or -b7 -h6 or higher run out of memory.
-m1..-m5 selects compression level (faster..better). Default is -m3.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
lza 0.01 39,644,188 302,602,114 142 9 111 LZ77 48
-m5 -b6 -h6 -t1 32,766,063 275,376,918 159,693 x 275,536,611 345 11 1024 LZ77 48
-m5 -b6 -h6 -t2 33,496,841 277,860,891 237 20 2048 LZ77 48
lza 0.10 -mx5 -b6 -h6 29,052,976 250,653,981 159,953 x 250,813,934 238 11 1012 LZ77 48
lza_x64 0.10 -mx5 -b7 -h7 28,835,165 246,671,312 259,425 x 246,930,737 265 12 1800 LZ77 48
lza 0.51 -mx5 -b6 -h6 -t1 28,365,587 242,852,984 179,415 x 243,032,399 243 10 1065 LZ77 48
lza_x64 0.51 -mx5 -b7 -h8 -t1 27,992,585 234,652,984 218,944 x 234,871,928 261 14 2998 LZ77 48
lza_x64 0.61 -mx5 -b7 -h7 28,019,802 236,604,708 218,090 x 236,822,798 279 10 1999 LZ77 48
lza_x64 0.62 -mx5 -b7 -h9 27,870,452 231,801,036 219,447 x 232,020,483 409 9.5 5000 LZ77 69
lza_x64 0.70b -mx9 -b7 -h7 27,111,239 229,073,644 260,686 x 229,334,330 378 10 2000 LZ77 48
lza 0.80 -mx9 -b7 -h7 27,148,092 229,483,126 284,285 x 229,764,411 456 12 2152 LZ77 48
lza 0.82b -mx9 -b7 -h7 26,396,613 222,808,457 285,766 x 223,094,223 449 9.7 2000 LZ77 48
.2241 brotli
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp CMem Dmem Alg Note
------- -------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- --- ----
bro 21 Sep 2015 -q 1 36,893,038 326,282,447 514,344 s 326,796,791 18 5.2 10 5 LZ77 48
-q 5 33,414,623 292,394,323 514,344 s 292,908,667 57 4.8 23 5 LZ77 48
-q 9 30,227,230 264,047,624 514,344 s 264,561,968 361 5.0 77 6 LZ77 48
-q 11 27,721,194 240,891,082 514,344 s 241,405,426 4386 5.0 294 6 LZ77 48
bro 18 Feb 2016 -q 1 38,802,994 343,293,825 542,345 s 343,836,170 12.4 6.3 8 5 LZ77 48
-q 5 33,414,209 59 4.5 38 6 LZ77 48
-q 9 30,227,246 407 4.7 68 6 LZ77 48
-q 11 27,076,871 235,560,131 542,345 s 236,102,476 3171 5.1 292 6 LZ77 48
-q 11 -w 24 25,764,698 223,597,884 542,385 s 224,140,269 3400 5.9 437 18 LZ77 48
.2276 szip
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Note
--------- --- --------- ----------- ------- ----------- ---- ---- --- ----
szip 1.12a -b41o16 26,120,472 227,586,463 31,708 x 227,618,171 1191 289 21 26
-b41o4 27,561,829 70 210 21 26
-b16o6 27,666,448 270 220 8 26
-b41o6 26,365,058 360 240 21 26
-b41o8 26,185,222 530 250 21 26
-b41o32 26,128,020 2550 400 21 26
-b41o64 26,130,850 5210 600 21 26
-b41o0 26,130,985 750 200 21 26
.2282 balz
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg Note
--------- --- --------- ----------- ------- ----------- ---- ---- --- ---- ----
balz 1.02 30,634,726 268,552,062 48,030 x 268,600,092 21804 58 346 LZ77
balz 1.06 e 28,674,640 1580 79 67 ROLZ
balz 1.06 ex 28,234,913 245,288,229 48,937 x 245,337,166 2440 75 67 ROLZ
balz 1.07 e 28,271,200 1060 96 132 ROLZ
balz 1.07 ex 27,416,245 237,492,151 49,082 x 237,541,233 2106 77 132 ROLZ
balz 1.08 ex 26,534,890 229,477,116 49,351 x 229,526,467 4431 126 200 ROLZ
balz 1.09 ex 26,534,257 229,476,459 49,928 x 229,526,387 4049 128 201 ROLZ
balz 1.12 e 27,522,348 1800 177 201 ROLZ
balz 1.12 ex 26,522,258 229,347,434 48,400 x 229,395,834 3989 148 201 ROLZ
balz 1.13 e 27,405,650 1670 221 206 ROLZ
balz 1.13 ex 26,421,416 228,337,644 49,024 x 228,286,668 3700 190 206 ROLZ
balz 1.15 ex 28,232,824 245,218,274 4,045 s 245,222,319 1064 95 67 ROLZ
balz 1.20 c 30,056,097 261,416,611 3,499 s 261,420,110 53 ROLZ 68
balz 1.20 cx 28,232,824 245,218,274 3,499 s 245,221,773 193 22 ROLZ 68
.2291 lzpm
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Opt enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- --- ---------- ----------- ----------- ----------- ----- ----- --- ----
lzpm 0.02 29,274,461 254,596,796 26,078 x 254,622,874 612 59 83 LZ77
lzpm 0.03 29,248,641 254,378,973 26,089 x 254,405,062 749 59 181 LZ77
lzpm 0.04 29,297,905 254,793,933 25,333 x 254,819,266 665 60 83 ROLZ
lzpm 0.06 28,896,680 251,111,835 25,369 x 251,137,204 852 58 83 ROLZ
lzpm 0.07 28,385,939 246,426,198 46,692 x 246,472,890 2185 56 280 ROLZ
lzpm 0.08 28,259,984 245,221,254 48,122 x 245,269,376 2754 59 280 ROLZ
lzpm 0.09 27,986,111 242,929,442 46,933 x 242,976,375 2451 56 280 ROLZ
lzpm 0.10 27,849,915 241,719,857 46,871 x 241,766,728 2598 57 280 ROLZ
lzpm 0.11 1 29,728,112 1162 76 723 ROLZ
2 27,967,747 3746 66 723 ROLZ
3 27,424,937 5204 68 723 ROLZ
4 27,239,304 6488 66 723 ROLZ
5 27,134,495 7446 63 723 ROLZ
6 27,038,405 8143 64 723 ROLZ
7 26,962,337 8761 63 723 ROLZ
8 26,890,422 9330 62 723 ROLZ
lzpm 0.11 9 26,501,542 229,083,971 46,824 x 229,130,795 15395 57 723 ROLZ
lzpmlite 0.11 1 30,136,214 627 69 362 ROLZ
3 27,918,695 2620 64 362 ROLZ
lzpmlite 0.11 9 27,096,516 235,135,224 48,144 x 235,183,368 6235 59 362 ROLZ
lzpm 0.12 9 27,391,197 237,915,048 47,030 x 237,962,078 4501 57 280 ROLZ
lzpm 0.13 9 27,318,013 237,241,658 47,129 x 237,288,787 4543 59 280 ROLZ
lzpm 0.14 9 27,091,358 235,074,141 48,790 x 235,122,931 6467 73 428 ROLZ
lzpm 0.15 9 27,145,224 235,567,823 48,401 x 235,616,224 6557 62 427 ROLZ
.2299 qazar
qazar 0.0pre5 is a free, closed source
command line file compressor by
Denis Kyznetsov, Jan. 31, 2006. It uses LZP, an LZ77 variant where
the decompresser dynamically computes the same sequence of context
matches as the compressor. The compressor uses a single bit flag
to indicate if the pointer computed by the decompresser should be
followed. In qazar, the output symbols are arithmetic coded.
.2317 KuaiZip
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Version enwik8 enwik9 size (zip) enwik9+prog Comp Decomp CMem Dmem Alg Note
------- -------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- --- ----
KuaiZip 2.3.2 x86 25,895,915 227,905,650 3,857,649 x 231,763,299 1061 47 197 19 LZMA 26
.2328 qc
qc 0.050 is a free, closed source,
command line file compressor by Denis Kyznetsov, Sept. 17, 2006.
The -8 option selects maximum compression (slowest and most memory).
.2334 ppms
.2356 dzo
.2428 comprox_ba
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Version enwik8 enwik9 size (zip) enwik9+prog Comp Decomp CMem Dmem Alg Note
------- -------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- --- ----
comprox_ba 20110927 27,831,722 242,858,769 4,165 s 242,862,934 1500 227 103 25 BWTS 26
comprox_ba 20110928 (Win32) 27,831,722 242,858,769 4,151 s 242,862,920 957 227 206 25 BWTS 26
comprox_ba 20110928 (Linux) 27,831,722 242,858,769 4,151 s 242,862,920 363 168 226 30 BWTS 48
comprox_ba 20110929 (Win32) 27,828,189 242,846,243 4,134 s 242,850,377 984 152 206 50 BWTS 26
comprox_ba 20110929 (Linux) 27,828,189 242,846,243 4,134 s 242,850,377 397 101 226 76 BWTS 48
.2453 turtle
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg
--------- --- --------- ----------- ------- ----------- ---- ---- --- ----
turtle v0.01 31,314,961 274,696,820 5,079 x 274,701,899 187 178 122 PPM
turtle v0.02 31,314,961 274,696,820 4,637 x 274,701,457 196 175 122 PPM
turtle v0.03 31,287,161 274,649,069 7,111 x 274,656,180 142 129 122 PPM
turtle v0.04 31,137,531 273,100,225 7,808 x 273,108,033 141 128 122 PPM
turtle v0.05 28,860,689 251,626,176 9,779 x 251,635,955 242 203 174 PPM
turtle v0.07 28,669,320 250,600,644 10,625 x 250,611,269 217 175 206 PPM
WinTurtle 1.2 8MB 29,601,717 258,927,402 238,080 x 259,164,482 248 242 31 PPM
512MB 28,814,475 250,364,644 238,080 x 250,598,724 264 240 548 PPM
WinTurtle 1.21 512MB 28,814,475 250,364,644 225,123 x 250,589,767 255 219 548 PPM
WinTurtle 1.30 512MB 28,814,478 250,364,647 239,247 x 250,603,594 243 240 597 PPM
WinTurtle 1.60 512MB 28,379,612 245,217,944 160,090 x 245,378,034 273 237 583 PPM
.2466 diz
.2508 cabarc
cabarc 1.00.0601
is a command line archiver available for free download by Microsoft, Mar. 18, 1997
(SDK released Jan. 8, 2002). It produces .cab files, which are often used to distribute Microsoft software.
It is designed for very fast decompression.
It uses LZX, a variant of LZ77 with fixed Huffman coding, but with shorter symbols reserved for the
three most recent matches. The option -m lzx:21 selects a window size of 221
(2 MB) for maximum compression.
There is a separate extraction program, "extract". The actual (global) decompression time of 32 sec. includes
15 sec. of CPU (process) time and the rest for disk I/O.
.2530 sr3
sr2 is a free,
open source (GPL) file compressor by Matt Mahoney, Aug. 3, 2007. It uses
symbol ranking. It takes no options. There are separate programs for
compression and decompression.
Program enwik8 enwik9 prog Total Comp Deco Mem Alg
------- ---------- ----------- ---- ------------ ---- ---- --- ---
sr2 30,432,506 273,906,319 2,831 sd 273,909,150 99 111 6 SR
sr3 28,926,691 253,031,980 5,611 x 253,037,591 130 146 68 SR
sr3 28,926,691 253,031,980 9,399 s 253,054,625 148 160 68 SR 26
.2540 bzip2
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp
------- ------- ---------- ----------- ----------- ----------- ----- -----
bzip2 1.0.2 -9 29,008,736 253,977,839 30,036 x 254,007,875 379 129
bzip2 1.0.3 -9 29,008,758 253,977,891 56,082 xd 254,033,973 334 120
.2542 RH5
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- ----
RH_x86 35,675,086 91,772 x 78 47 8 ROLZ 26
RH2_x86 c1 34,857,781 67 27 64 LZP 26
c2 31,957,388 149 28 64 ROLZ 26
c3 31,937,059 279,524,710 93,364 x 279,618,074 152 28 64 ROLZ 26
RH2_x64 c3 31,937,063 279,524,714 97,016 x 279,621,730 72 20 64 ROLZ 48
RH2_x64 20Feb2014 c1 34,816,471 306,646,293 32 15 64 LZP 48
c2 32,215,361 282,209,254 48 14 64 ROLZ 48
c3 30,960,001 271,181,799 67 17 64 ROLZ 48
c4 30,787,281 269,670,002 76 15 64 ROLZ 48
c5 30,543,306 267,344,532 53,408 x 267,397,940 447 18 64 ROLZ 48
RH4_x64 22Mar2014 c1 32,664,118 44 13
c2 31,309,650 47 12
c3 30,906,206 61 12
c4 30,872,697 64 12
c5 30,030,867 128 11
c6 29,553,289 258,411,625 79,155 x 258,490,780 301 12 27 ROLZ 48
RH4_x64 24Apr2014 c2 31,309,670 274,101,406 90,071 x 274,191,477 44 9 31 ROLZ 48
c6 29,553,309 258,411,645 90,071 x 258,501,716 287 9 31 ROLZ 48
RH5_x64 c2 31,798,141 278,822,435 36,744 x 278,859,179 28 11 22 ROLZ 48
c6 29,878,256 261,791,548 36,744 x 261,828,292 153 11 22 ROLZ 48
-window:27 c6 29,078,552 254,220,469 36,744 x 254,257,213 196 9.4 145 ROLZ 48
.2545 RangeCoderC
RangeCoderC v1.2
(discussion)
is a free, experimental open source file compressor by David Catt,
Nov. 23, 2011. The option 26 selects a simple bitwise order 26 model.
An order n model requires 16*2n bytes of memory.
0 - Simple Bitwise Model (default)
1 - Indirect Bitwise Model
2 - Indexed Bitwise Model Array
3 - Hashed Bitwise Model
4 - Bitwise Linear CM
5 - Bitwise Linear CM With SSE
c1 failed on enwik8. It produced a "compressed" file about 2.5 GB which decompressed
incorrectly. The other modes were tested at the highest order allowed by the
2 GB memory space available in the 32 bit version.
6 - Bytewise Hashed Model
7 - Combined Model
The Bytewise Hashed model uses the hash and cache structure from ZPAQ to achieve high speeds,
even at higher orders. The Combined Model uses the same structure as the Double Model but has
a hashed context and outputs its predictions into a SSE model for better compression.
0 - Bytewise Hashed Model
1 - Simple Bitwise Model (default)
2 - Adaptive Bitwise Model
3 - Indexed Bitwise Model Array
4 - Hashed Bitwise Model
5 - Combined Model
Only the new mode (c2) was tested.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ----
RangeCoderC v1.2 0 99,801,301
1 99,660,153
2 97,987,717
3 96,963,829
4 95,670,157
5 94,154,825
6 87,831,925
7 80,009,581
8 73,016,189
16 46,805,877
24 35,625,897
25 34,635,889
26 33,761,533 320,897,805 4,120 x 320,901,925 1324 1348 1050 26
RangeCoderC v1.3 27 33,225,249 314,021,089 3,977 x 314,025,066 1210 1234 1050 26
26 (Double) 30,934,993 285,258,957 4,052 x 285,263,009 1501 1488 1100 26
RangeCoderC v1.4 27 (Hashed) 30,371,685 271,371,793 4,407 x 271,376,200 1809 1658 1050 26
26 (Double) 30,934,989 4,359 x 1560 1650 1116 26
27 (Indirect) 36,108,281 4,773 x 2700 3090 1182 26
27 (Standard) 33,225,245 4,288 x 1210 1270 1050 26
RangeCoderC v1.5 c3 27 30,371,685 5,747 x 1740 1810 1050 26
RangeCoderC v1.6 c0 26 33,761,529 7,028 x 1200 1230 525 26
c0 27 33,225,245 1280 1330 1050 26
c2 26 30,934,989 1610 1680 1116 26
c3 26 30,832,497 1610 1720 525 26
c3 27 30,371,685 1740 1790 1050 26
c4 26 29,269,185 5320 5880 1642 26
c5 26 28,461,477 260,009,661 7,028 x 260,016,689 5752 5833 1642 26
RangeCoderC v1.7a c1 27 36,108,281 7,060 x 2570 3000 1182 26
RangeCoderC v1.7 c0 27 33,225,245 1300 1330 1050 26
c1 27 36,108,281 2490 2420 1182 26
c2 26 30,934,989 1590 1660 1116 26
c3 27 30,371,685 1710 1980 1050 26
c4 26 29,269,185 5120 5130 1641 27
c5 26 28,461,477 260,009,661 7,858 x 260,017,519 5832 5779 1642 26
c6 27 35,265,593 990 1020 1050 26
c7 26 28,788,013 254,527,369 7,858 x 254,535,227 2460 2436 1116 26
RangeCoderC v1.8 c2 28 32,432,825 285,488,437 6,537 x 285,494,974 1338 1363 1050 26
.2561 quad
Compression Compressed size Decompresser Total size Time (ns/byte)
Program enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ---------- ----------- ----------- ----------- ----- ----- --- ----
quad v1.01a 29,930,547 263,137,995 26,927 x 263,164,922 1281 168 33 LZ77
quad v1.04a 27,712,832 239,596,416 38,552 x 239,634,968 933 748 165 LZP
quad v1.07b x 29,360,404 258,361,092 61,067 x 258,422,159 1282 146 33 LZP
quad v1.08 x 29,171,593 256,664,803 13,042 s 256,677,845 1206 164 33 LZP
quad v1.10 -x 29,152,166 256,486,470 13,288 s 256,499,758 1007 117 34 LZP
quad v1.11 -x 29,110,579 256,145,858 13,387 s 256,159,245 956 116 34 ROLZ
quad v1.11HASH2 -x 29,110,519 256,145,858 30,129 x 256,175,987 705 117 42 ROLZ
quad v1.12 -x 29,110,519 256,145,858 13,516 s 256,159,334 527 120 34 ROLZ
.2572 WinACE
Compression Compressed size Decompresser Total size Time (ns/byte)
Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp
------- ---------- ----------- ----------- ----------- ----- -----
-sfx -m5 -d4096 29,481,470 257,237,710 0 xd 257,237,710 1080 77
-sfx -m5 30,919,182 270,578,538 0 xd 270,578,538 738 79
-sfx 30,937,342 ~770 ~40
.2589 lzsr
.2595 zling
zling
(discussion)
is a free, open source (BSD
license) file compressor by Zhang Li, Nov. 1, 2013. It uses order 1 ROLZ,
based on the order 3 ROLZ compressor zlite. It takes no options.
The compressor is C source code only. To test, it was compiled with
gcc 4.8.0 -O3 for 32 bit Windows.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
zling Nov-01-2013 33,297,650 292,746,596 5,468 s 292,752,064 80 21 37 ROLZ 26
zling Dec-25-2013 32,222,737 282,435,374 12,807 s 282,448,181 33 8 27 ROLZ 48
zling Jan-21-2014 32,189,336 281,869,136 14,886 s 281,884,022 78 21 29 ROLZ 26
zling Jan-21-2014 32,189,336 281,869,136 14,886 s 281,884,022 29 7 29 ROLZ 48
zling_demo Feb-19-2014 31,310,257 274,180,830 32,046 s 274,212,876 56 14 27 ROLZ 48
zling_demo Mar-24-2014 e0 33,391,083 24 9 27 ROLZ 48
e1 32,613,829 29 9 27 ROLZ 48
e2 31,732,466 33 9 27 ROLZ 48
e3 31,310,257 40 9 27 ROLZ 48
e4 30,861,848 270,258,636 32,421 s 270,291,057 51 9 27 ROLZ 48
zling_demo 201401414 e0 32,456,306 284,804,449 23 9 27 ROLZ 48
e1 31,800,497 278,703,086 28 9 27 ROLZ 48
e2 31,419,861 275,231,487 32 9 27 ROLZ 48
e3 31,064,418 271,969,050 36 9 27 ROLZ 48
e4 30,782,340 269,496,300 31,644 s 269,527,944 42 9 27 ROLZ 48
zling_demo 20140430-bugfix e0 32,378,187 29 11 27 ROLZ 48
e1 31,720,214 30 11 27 ROLZ 48
e2 31,340,822 34 11 27 ROLZ 48
e3 30,979,872 39 11 27 ROLZ 48
e4 30,707,022 268,793,105 32,148 s 268,825,253 40 10 27 ROLZ 48
zling_demo 20160107 e0 31,455,205 93 29 22 ROLZ 48
e4 29,721,114 259,475,639 35,582 s 259,511,221 83 27 28 ROLZ 48
.2625 xpv5
Compression Compressed size Decompresser Total size Time (ns/byte)
Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
xpv5 c0 31,675,180 277,174,541 14,371 x 277,188,912 908 534 9 ROLZ 26
c1 30,297,863 265,643,665 14,371 x 265,658,036 1236 515 9 ROLZ 26
c2 29,963,217 262,525,246 14,371 x 262,539,617 2359 516 9 ROLZ 26
.2660 sr3c
sr3c 1.0 is a free,
open source (MIT license) file compressor and library by Kenneth Oksanen,
released Nov. 27, 2008. It uses symbol ranking, based on ideas from SR3, but
completely rewritten in C. The distribution contains a portable compression
engine and source code for drivers for UNIX/Linux. To test, I wrote a simple driver
for Windows (sr3cw) and compiled it using gcc 3.4.5 -O3 -fomit-frame-pointer -march=pentiumpro
-s and included sr3cw.exe in the distribution. The driver takes no options.
.2665 lzc
lzc v0.01
is a free, closed source file comprssor by
Nania Francesco Antonio, May 8, 2007. It uses an LZ77 like algorithm.
The option 4 selects the maximum memory mode, 1 GB + 100 MB for compression and
16 + 100 MB for decompression. The actual memory usage indicated by Windows
Task Manager in this mode was 360 MB for compression and 107 MB for decompression.
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg
--------- --- --------- ----------- ------- ----------- ---- ---- --- ----
lzc v0.01 4 40,312,925 363,504,638 7,656 x 363,512,294 238 61 360 LZ77
lzc v0.03 4 37,908,748 341,811,895 8,268 x 341,820,163 182 61 515 LZ77
lzc v0.04 4 37,779,426 340,628,765 8,869 x 340,637,634 142 59 540 LZ77
lzc v0.05b 1 44,893,624 117 54 LZ77
lzc v0.05b 16 30,611,315 267,784,591 9,158 x 267,793,749 365 82 771 LZ77
lzc v0.06b 16 30,611,315 267,784,590 12,170 x 267,796,760 347 68 790 LZ77
lzc v0.07 1 40,554,444 110 60 70 LZ77
lzc v0.07 10 30,611,315 266,565,255 28,997 x 266,594,252 309 67 584 LZ77
lzc v0.08 10 30,611,315 266,565,255 11,364 x 266,576,619 302 63 550 LZ77
.2774 nakamichi
Compressor Opt enwik8 enwik9 Prog Total Comp Decomp Mem Alg Note
--------- --- --------- ----------- ------- ----------- ---- ---- --- ---- ----
nakamichi 2019-Jul-01 32,917,888 277,293,058 112,899 s 277,405,957 8200000 1.3 302000 LZSS 85
.2794 crush
crush 0.01
is a free, experimental file compressor by Ilia Muraviev, May 17, 2011.
It uses LZ77. It has 3 compression modes: cf (fast), c (medium), and cx (best).
Compression in all modes use 143 MB memory, and decompression uses 65 MB.
0,xxxxxxxx - literal byte x
1,1,xx - match length x+3 (3..6)
1,0,1,xx - match length x+7 (7..10)
1,0,0,1,xx - match length x+11 (11..14)
1,0,0,0,1,xxx - match length x+15 (15..22)
1,0,0,0,0,1,xxxxx - match length x+23 (23..54)
1,0,0,0,0,0,xxxxxxxxx - match length x+55 (55..566)
A match code is followed by 2 fields (call them L and P) giving the offset.
L is 4 bits, and gives the length of P. If L
is 0000, then P is 5 bits and the offset is P + 1 (1..32).
If L is in 1..15, then P is L + 4 bits long and the offset is
2L+4 + P + 1 (33..220). A match is decoded by
going back offset bytes in the output and copying the specified length to the output.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- ----
crush 0.01 cf 37,401,090 330,975,986 46,879 x 331,022,865 94 17.2 143 LZ77 26
crush 0.01 cf 37,401,090 330,975,986 46,879 x 331,022,865 21 4.2 143 LZ77 50
crush 0.01 c 33,618,865 1040 13 143 LZ77 26
crush 0.01 c 33,618,865 297,103,092 46,879 x 297,721,957 129 3.9 143 LZ77 50
crush 0.01 cx 32,577,338 4490 13 143 LZ77 26
crush 0.01 cx 32,577,338 287,333,602 46,879 x 287,380,481 532 3.8 143 LZ77 50
crush 0.01 cx 32,577,338 287,333,602 2,469 s 287,336,071 532 3.8 143 LZ77 50
crush 1.00 cf 37,308,893 132 15 148 LZ77 26
crush 1.00 c 32,878,537 1541 15 148 LZ77 26
crush 1.00 cx 31,731,537 7916 15 148 LZ77 26
crush 1.00 cx 31,731,711 279,491,430 2,489 s 279,493,919 948 2.9 148 LZ77 60
.2836 xeloz
Option c889 selects maximum compression. c indicates a sliding window.
The first digit 8 selects 216+8 bytes
= 16 MB block size (default is 4 = 1 MB).
The second digit 8 selects the parsing method where 0..2 is greedy, 3..5
is lazy, and 6..8 is optimal and uses a suffix array (libdivsufsort)
to find matches, and higher number compress slower but better. Default is 6.
The third digit 0..9 (default 2) selects encoding level, where 9 is slowest
with best compression.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- ----
xeloz 0.3.5.3 c 35,504,888 312,908,049 18,771 s 312,926,820 238 6 8 LZ77 48
c889 32,441,272 283,621,211 18,771 s 283,639,982 1079 8 230 LZ77 48
xeloz 0.3.5.3a C 37,343,227 329,469,433 18,849 s 329,488,282 134 7 24 LZ77 48
.2839 bzp
bzp 0.2 is a free file
archiver by Nania Francesco Antonio, Sept. 16, 2008. It uses LZP
and arithmetic coding. It takes no options. Earlier versions (0.0, 0.1)
were not tested.
.2843 lzwg
lzwg
is a free, experimental file compressor by Gerald. R. Tamayo, Sept. 15, 2022.
It uses LZW with a binary search tree. It resets the dictionary when TABLE_SIZE + 4K
codes are transmitted. TABLE_SIZE depends on the option used.
Option -27 uses 13 x 2^27 bytes (1.7 GB) memory.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- ----
lzwg -24 34,423,369 102 26 218 LZW 95
lzwg -27 284,356,322 19,828 xd 284,376,150 135 41 1744 LZW 95
lzwhc -c28 34,423,369 284,356,322 30,720 x 284,387,042 95 42 2400 LZW 95
.2857 ha
ha 0.98 is a free
command line archiver by Harry Hirvola, Jan. 7, 1993. A later version,
0.999b, is available for UNIX with source code and ports to DOS. It uses order-5 PPMC
(PPM with fixed escape probabilities for dropping to a lower order context.
Newer PPM compressors (PPMZ, PPMII) use adaptive escape probabilities given a small context.)
The command a2 selects compression method HSC (default is a1 = ASC). a21 automatically
chooses the best method. Time is ns/byte.
Version Options enwik8 Comp Decomp Notes
-------- ----- ---------- ---- ---- -----
ha 0.98 a1 36,379,137 873 257 ns/byte
ha 0.98 a2 31,250,524 2080 1850
ha 0.999b a21 31,250,523 2447 16 DOS compile, 1995
ha 0.9991a a21 31,250,524 1551 16 DOS (.com) compile, 1995
ha 0.999b a21 31,250,524 1290 16 Compiled for NT by Michael Markowsky at Apr 30 1997
lgha v1.1 a21 31,250,524 1110 16 ha v.0999c DOS compile by Lyapko George, 1999
lgha v1.1 31,250,524 1068 1114 16
.2910 ulz
Program Options enwik8 enwik9 prog size Total Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ---------- ----------- ---- ----- --- ---- ----
ulz 0.01 c1 45,751,335 411,826,108 47,809 x 411,873,917 50 11 43 LZ77 26
c2 41,677,764 77 10 43 LZ77 26
c3 39,368,127 145 9 43 LZ77 26
c4 37,861,566 581 9 43 LZ77 26
c5 37,652,826 332,626,591 332,674,400 1077 9 43 LZ77 26
ulz 0.02 c1 50,382,083 37 10 43 LZ77 26
c2 45,751,335 52 10 43 LZ77 26
c3 41,677,764 74 9 43 LZ77 26
c4 39,368,127 139 8 43 LZ77 26
c5 37,861,566 576 8 43 LZ77 26
c6 37,652,826 332,626,591 47,833 x 332,674,424 1056 8 43 LZ77 26
ulz 0.03 cf 45,613,380 402,610,627 48,583 x 402,659,210 13 3.2 29 LZ77 48
c 39,946,599 353,878,403 48,583 x 353,926,986 54 3.3 29 LZ77 48
cu 37,199,413 329,119,609 48,583 x 329,168,192 192 3.2 228 LZ77 48
cu 37,199,413 329,119,609 48,583 x 329,168,192 115 1.4 228 LZ77 68
ulz 0.06 c1 47,674,405 421,011,442 49,450 x 421,060,892 7.4 1.0 94 LZ77 82
c 41,660,387 365,851,618 49,450 x 365,901,068 30 1.1 94 LZ77 82
c9 32,945,292 291,028,084 49,450 x 291,077,534 325 1.1 490 LZ77 82
.2924 irolz
irolz
source code is a free,
open source (GPL), experimental file compressor by Andrew Polar, Sept. 26, 2010.
It uses ROLZ. The algorithm is like LZ77 except that match offsets are coded
by counting previous occurrences of the current context in the history buffer
rather than as pointers. In irolz, the context is order 2. Previous occurrences
are stored in a linked list with a maximum length of 31 (5 bit offset). Matches
less than 4 bytes are coded as literals. Symbols (match flags, 5 bit offsets,
8 bit lengths, and 8 bit literals) are binary arithmetic coded. Lengths and literals
are coded in an order 2 context model. Match flags and offset counts are modeled
without context.
Each symbol and context to be predicted is mapped to 2 16-bit predictions, one
fast adapting (learning rate 1/8) and one slow adapting (rate 1/64). The prediction
is the average of the two.
.2961 lcssr
symbra 0.2 is a free, open source (GPL)
(mirror with .exe)
file compressor by Frank Schwellinger, Nov. 29, 2007. It uses symbol ranking.
Only source code (C++) is provided. For the test, the program was compiled
as indicated in the source comments and tested in Windows XP (32 bit).
The option -c4 or -c5 selects order 4 or 5 context. -m5 turns on suffix
matching with maximum buffer size, which greatly slows compression. -p2 selects
2 passes, which reorders the alphabet by descending frequency. The defaults
are -c4 -m0 -p1.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
symbra 0.2 -c4 -m0 -p1 38,308,164 352,524,859 11,299 s 352,536,158 245 282 68 SR 26
symbra 0.2 -c4 -m5 -p2 34,644,072 302,948,753 11,299 s 302,960,062 4669 4633 112 SR 26
symbra 0.2 -c5 -m5 -p2 34,683,661 302,656,095 11,299 s 302,667,394 4700 4622 112 SR 26
lcssr 0.2 -b7 -l9 34,549,048 296,160,661 8,802 x 296,169,463 8186 8281 1184 SR 26
.2984 zlite
.3062 lazy
lazy v1.00
is a free, open source file compressor by Matt Mahoney, Oct. 10, 2012.
It uses LZ77. It has 5 compression levels from 1 to 5. Higher levels
are slower and use more memory to compress. However decompression speed
does not change and always uses 16 MB.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
lazy 1.00 1 40,518,222 359,237,695 5,986 s 359,243,681 57 25 36 LZ77
2 38,580,043 340,152,648 5,986 s 340,158,634 75 25 40 LZ77
3 37,074,105 325,609,617 5,986 s 325,615,603 104 29 48 LZ77
4 35,908,430 314,545,955 5,986 s 314,551,941 166 25 64 LZ77
5 35,024,082 306,245,949 5,986 s 306,251,935 273 24 96 LZ77
.3085 zhuff
zhuff
0.1
is a free file compressor for Windows by Yann Collet, Dec. 13, 2009. It is described as
a combination of LZ4 and Huff0, a fast Huffman coder. LZ4 uses LZSS, an LZ77 variant
using flags to identify matches and literals. It requires the Microsoft runtime libraries,
which are not included in the program size shown.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---- ----
zhuff 0.1 43,299,291 384,578,436 9,626 x 384,588,062 16 10 1.4 LZ77 26
zhuff 0.7 -t1 40,974,542 365,122,888 45,522 x 365,168,410 17 10 12 LZ77 26
-t2 40,974,542 365,122,888 45,522 x 365,168,410 17 10 19 LZ77 26
zhuff 0.8 -c0 40,990,942 365,277,964 50,939 x 365,328,903 18 13 19 LZ77 26
-c1 36,235,017 320,629,066 50,939 x 320,680,005 73 12 19 LZ77 26
-c2 35,078,148 309,881,876 50,939 x 309,932,815 111 11 19 LZ77 26
zhuff 0.95b -c0 40,615,710 362,653,616 61,684 x 362,715,300 6.5 4.2 32 LZ77 48
-c1 35,973,813 319,010,291 61,684 x 319,071,975 15 3.6 32 LZ77 48
-c2 35,022,597 309,639,139 61,684 x 309,700,823 24 3.6 32 LZ77 48
zhuff 0.97 beta -c0 37,076,873 328,438,763 63,209 x 328,501,972 10 4.0 32 LZ77 48
-c1 35,864,003 317,929,499 63,209 x 317,992,708 16 3.7 32 LZ77 48
-c2 34,907,478 308,530,122 63,209 x 308,593,331 24 3.5 32 LZ77 48
.3088 lzhhf
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- --- --
lzhhf 34,848,933 308,825,079 24,576 xd 308,849,655 392 12 14 LZ77 95
.3092 slug
slug v1.1b
(mirror)
is a free, closed source file compressor by Christian Martelock,
Apr. 26, 2007. It uses an LZ type algorithm with
a 128K non-sliding window and Huffman coding.
It is designed for high speed and low memory usage.
System (wall) times for enwik9: 18 (51) seconds for compression,
14 (30) for decompression.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
slug 1.1b 45,274,048 404,250,979 5,836 x 404,256,815 18 14 1 LZ77
slug 1.27 35,093,954 309,201,454 6,809 x 309,208,263 32 28 14 ROLZ
.3098 lzuf62
Program enwik8 enwik9 prog size Total Comp Decomp Mem Alg
------- ---------- ----------- ---------- ----------- ---- ----- --- ----
lzuf 38,036,810 338,488,945 4,070 xd 338,493,015 446 40 2 LZ77 26
lzuf62 34,960,889 309,837,920 24,576 xd 309,862,496 375 11 14 LZ77 95
.3098 pigz
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- --
gzip 1.3.5 -9 36,445,248 322,591,995 34,408 x 322,626,403 55 22 4.5 LZ77 48
pigz 2.2.3 -9 36,490,716 322,926,625 36,521 xd 322,963,146 31 10 115 LZ77 48
pigz 2.3 36,565,142 324,081,152 52,717 s 324,133,869 25 12 3 LZ77 48
-9 36,490,716 322,926,625 52,717 s 322,979,342 29 13 3 LZ77 48
-11 35,002,893 309,812,953 52,717 s 309,865,670 2237 13 25 LZ77 48
.3102 kzip
kzip is a free, closed source
command line compressor by Ken Silverman, compiled May 13, 2006,
released May 18, 2006. It is an optimizing compressor producing
zip-compatible archives but with better compression. The option /b512 sets the
block splitting threshold. The default is /b256, but /b512 was found optimal
on enwik8. /s0 (default) selects maximum compression and ranges from /s0
to /s3. No decompresser is included, but archives can be read with any
program that reads zip files (pkzip, unzip, 7zip, WinRAR, WinACE, etc).
Options enwik8 Comp (ns/B) enwik9 ------- ---------- ----------- ---------- /s0 /b0 35,029,924 2490 (one large block) /s0 /b256 35,025,767 5220 310,281,906 (default, s0 = extreme mode) /s0 /b512 35,012,219 5410 310,248,404 (best enwik8) /s0 /b1024 35,016,649 4440 310,188,783 (best enwik9) /s1 35,028,473 5240 (s1 = intense mode) /s2 42,370,689 860 (s2 = longest run) /s3 63,191,700 820 (s3 = Huffman code only) pkzip 204 36,934,712 123 (for comparison)
uc2 (UltraCompressor II revision 3 pro) is a commercial (free for noncommercial use) command line and GUI archiver for DOS by Nico de Vries, June 1, 1995. It uses LZ77 and Huffman coding. The -tst option selects maximum compression.
uc2 includes a program for converting archives to self extracting
programs (uc2sea) which produced smaller files (enwik8.exe = 35,397,343 bytes,
enwik9.exe = 312,759,499 bytes), but in this mode decompression failed for enwik9,
truncating the last 21 bytes of output. uc2sea works by first extracting the
archive and then recompressing it using a slightly different algorithm.
.3141 thor
thor 0.9a is an experimental,
closed source, command line file compressor by Oscar Garcia, Mar. 19, 2006.
It is the fastest compressor on the maximumcompression
benchmark. It has 3 modes: ef (fastest), e (normal) and ex (best). However in this test it
appears speed may be limited by disk I/O.
thor 0.94 alpha (mirror) (mirror) was relesed Apr. 22, 2007. exx is a new mode to select maximum compression. Times shown are process times excluding disk I/O. Actual times are 96 sec. to compress, 75 sec. to decompress).
thor 0.95 (mirror), May 8, 2007, has 5 compression options: e1 through e4 are LZP in order of increasing compression; e5 is LZ77. Note that e5 is best on enwik8 but e4 on enwik9.
thor 0.96a, Aug. 23, 2007, works like 0.95.
Compression Compressed size Decompresser Total size Time (ns/byte) Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem ------- ------- ---------- ----------- ----------- ----------- ----- ----- --- thor 0.9a ex 41,670,916 368,669,696 61,556 x 368,731,252 54 51 5.5 thor 0.9a e 45,842,692 412,096,696 61,556 x 412,157,852 44 50 thor 0.9a ef 55,063,944 490,400,720 61,556 x 490,461,876 45 53 thor 0.94a exx 35,696,028 315,611,168 68,922 x 315,680,090 82 32 2 thor 0.95 e1 55,138,792 21 27 thor 0.95 e2 45,714,740 21 23 thor 0.95 e3 41,528,948 29 29 thor 0.95 e4 35,795,184 314,092,324 49,925 x 314,142,249 64 34 16 thor 0.95 e5 35,696,032 315,611,172 49,925 x 315,661,097 80 22 2 thor 0.96a e1 54,915,456 488,397,982 50,071 x 488,448,053 17 20 1.6 thor 0.96a e2 45,714,724 411,416,252 50,071 x 411,466,323 23 19 1.5 thor 0.96a e3 41,531,628 367,671,220 50,071 x 367,721,291 27 24 6 thor 0.96a e4 35,795,184 314,092,324 50,071 x 314,142,395 62 30 16 thor 0.96a e5 35,696,032 315,611,172 50,071 x 315,661,243 80 18 2
etincelle
alpha 3
is a free file compressor by Yann Collet, Mar. 26, 2010. It uses ROLZ with
an order 1 context to reduce the offest length, followed by Huffman coding.
lz5 1.3.3 is a
free, open source file compressor by Przemyslaw Skibinski, Jan. 5, 2016.
It is a modification of lz4 by Yann Collett. It uses byte-aligned LZ77 codes
as follows:
lz5 was compiled using gcc 4.8.4 with the supplied Makefile for Ubuntu.
Option -0 through -18 selects the compression level (fastest..best).
Default is -0.
gzip
1.3.5 is an open source single file command line compressor
by Jean-loup Gailly and Mark Adler, Sept. 30, 2002.
It uses LZ77 (flate, but not compatible with zip).
The -9 option selects maximum compression although its effect is small (see below).
Info-ZIP 2.3.1 (Mar. 8, 2005)
is a free, open source
archiver for many operating systems. It uses the standard LZ77 "flate" format, like
gzip and many zip-compatible programs. (The sizes are exactly 125 bytes larger
than gzip). This test was under Linux
(Ubuntu 2.6.15.27-amd64-generic) on a 2.2 GHz Athlon-64.
Uncompression was with UnZip 5.52 (Feb. 28, 2005), both part of the normal
Ubuntu distribution. The -9 option selects maximum compression.
The Windows version 2.32 is dated June 19, 2006.
Info-ZIP 3.00 was released July 7, 2008. Decompression was tested with
UnZip 6.00, released Apr. 29, 2009.
pkzip 2.04e is a commercial
(free trial) command line archiver by PKWARE Inc.
written Jan 25, 1993. It uses LZ77 (flate format).
The option -ex selects maximum compression. The decompresser is pkunzip 2.04e.
Times are wall times. (Timer doesn't show process times for DOS programs).
There are many programs that produce zip files. I don't plan to test them all.
jar 0.98-gcc is an open
source command line archiver by Bryan Burns, 2002. It uses LZ77 (zip). It is included with Java (1.5.0_06) and
is normally used to create .jar files for compiled Java applications and applets, but it can
also be used as an archiver. It has no compression options.
The cvf options creates an archive. The M option says to not add a manifest file.
Note: this is not the jar compressor from Arjsoft.
PeaZip 1.0 by Giorgio Tani (Nov. 6, 2006)
is a GPL open source GUI archiver
supporting several common formats. The format tested is the native format which uses zlib
(gzip algorithm). The "better" option chooses best compression (equivalent to gzip -9).
Integrity check (checksum) and encryption are turned off.
arj 3.10 is a free, open source
(GPL v2) archiver by ARJ Software Russia, June 23, 2005. It is compatible
with the original ARJ by Robert K. Jung, which was patented
(U.S. patent 5140321 A)
filed Sept. 4, 1991 and presumably expired. According to the patent,
it uses LZ77 with flags to indicate a repeat of the last match
(like LZX used in cabarc). Matches are found from a hash table of
FIFO queues.
The options -m0 through -m4 select compression level. The default,
-m1, gives maximum compression. -m0 stores with no compression.
-m1 through -m4 compress progressively larger but faster, with slower
decompression.
lzgt1
(click on lzgt3a.zip) is one of a group
of free, open source, experimental file compressors by Gerald R. Tamayo, released
July 17, 2008. It uses LZT (Lempel-Ziv-Tamayo) compression, a LZ77 variant
in which the decompresser rebuilds a list of matches sorted by context match
length and the match length is implied or partially implied by the position
in the list. lzgt implements LZT using a 4K sliding window, 32 byte
look-ahead buffer and 3 bit code length. lzgt1 is like lzgt
but uses a 16K sliding window and 128 byte look-ahead buffer.
lzgt2 eliminates the code length entirely. lzgt3 is an improved version
of lzgt2. All programs have separate decompressers (lzgtd1, etc) and are
compiled for DOS (and Windows).
lzgt3a was added Oct. 25, 2008. It uses a 128K window size, 64K
lookahead buffer, and improved coding.
The most recent version was written in Visual C and ported to Windows as a
cross compressor intended to produce self extracting archives for the
Commodore. By default, pucrunch appends a 276 byte header containing 6510 code to
extract the file. There are also standalone decompressers written in 6510
assembler and in Z80 assembler. I could not test in these environments, so I
used the -d -c0 options to turn off the self extracting feature, which requires
the (larger) Win32 external compressor/decompresser.
There are two additional limitations. First, the decompresser appends a 2 byte
header to indicate the load address, which is required by the Commodore. To
make the decompressed file bitwise identical, this must be stripped off. Second,
the input file size is limited to 64,936 bytes. The author tested a modified
version without a file size limit on the Calgary corpus, but this modified version
was not posted, so I did not use it.
To overcome these limitations
I wrote the following Perl scripts to compress and decompress. The first script
compresses by splitting the input into blocks of 64,936 bytes, compressing them
separately, and appending the compressed files each with a 2 byte header to indicate
the block size. The second script decompresses each block one at a time, strips
off the 2 byte Commodore header, and appends them. Each script takes the input
and output files as command line arguments. The second script is included in
the decompresser size.
pucrunch suggests using -p1 and -m6 options to improve compression
but these do not help.
Run times are wall times. Using scripts, Timer 3.01 does not provide
useful process times, since it times Perl rather than pucrunch.
The decompression time (463 sec) is probably high because Windows Task Manager
shows that pucrunch is running only a small fraction of the time, perhaps 10%.
Most of the time is probably the overhead of file I/O and running pucrunch
15,400 times.
The pair of bit counts and the character count mod 3 (probably unnecessary)
are mapped to a second table of counts to compute
the next-bit probability. That table is updated by incrementing the appropriate
count and halving both if the sum exceeds 60000. The initial mapping
of this second table is (n0,n1) to (n0,n1) except if either of the input counts
is 0, in which case the mapping is (0,n1) to (1,1+2^n1) or (n0,0)
to (1+2^n0,1). The final bit prediction is n1/(n0+n1).
The program was a submission
to a data compresssion context for Dr. Dobbs Journal. To test, the source
code was compiled using make and tested in Linux. It compresses and decompresses
from standard input to standard output. It takes no options.
lzop v1.01 is a free, open source (GPL) command line
file compressor by Markus F.X.J. Oberhumer, Apr. 27, 2003. A newer version, 1.02 rc1
was released July 25, 2005, but no Win32 executable was available for download
as of May 29, 2006. lzop uses LZ77. It is designed for high speed. -9 selects
maximum compression. lzop is I/O bound. timer 3.01 reports the decompression
process time as 12 seconds. The remaining 38 seconds is due to disk access.
lzw v0.2 was released with
public domain source code for the decompresser, which zips to 671 bytes. The file
format is as follows. There is no header or trailer.
Each 16 bit code word is in machine dependent order
(LSB first on x86). Codes 0-255 represent single bytes of the same value.
Codes 256-65535 are assigned in ascending order by concatenating the decoded
values of the previous two codes. After assigning code 65535, new codes are
assigned by replacing the oldest codes first, starting with 256.
Data is decoded into a rotating buffer of size 16 MiB (224 bytes)
by copying a string from elsewhere in the buffer. Neither the original nor
copied string crosses the buffer boundary, and they do not overlap each other.
No new symbol is added after decoding the first byte of the buffer.
MTCompressor v1.0
(discussion)
is a free, experimental command line compressor for Windows by David Catt,
Jan. 20, 2012. It uses an LZ77 variant similar to deflate. It is multi-threaded.
Reported time is real time running on 2 cores (note 26). Memory usage
fluctuates during use. The peak is reported.
lz4opt v1.00
is a free, closed source file compressor for 32 bit Windows by Ilia Muraviev,
Feb. 9, 2016.
It is compatible with LZ4, an LZ77 compressor. Options cf, c, cb
compress fast, normal, and best respectively.
lz4x v1.02 was released
Apr. 6, 2016. The options c1..c4 compress faster..better with LZ4 compatibility.
arbc2z is a free, experimental command line
file compressor with source code by David A. Scott, June 23, 2006.
It is a bijective order-2 (PPM) arithmetic coder. A bijective
coder has the property that all inputs to the decompresser are valid and produce distinct outputs.
The above archive also contains arbc2, which uses a different method of handling of the zero frequency problem,
arbc1 (order 1), and arbc0 (order 0), all of which are bijective.
lz4 v0.2
(website)
is a free file compressor by Yann Collet, Oct. 16, 2009.
It uses LZSS (an LZ77 variant
with flags to mark literals and matches). It takes no options.
Run times are dominated by disk access.
lz4 0.6
was released Dec. 12, 2010. lz4hc 0.9 (Dec. 13, 2010, same link) is a compatible
version with better compression. In both cases, run times are dominated by
disk access. Times shown are process times.
Actual times were 80+37 sec. for lz4 and 137+39 sec. for lz4hc.
The programs take no compression options.
lz4 v1.2
was released Oct. 10, 2011. It has 3 compression levels (c0...c2).
The program automatically detects the number of cores (2, note 26) and uses the
same number of threads. However compression in mode c0 and all
decompression modes are I/O bound, using about 20% of available CPU.
For these modes, process time is reported. Compression modes c1 and
c2 are real times with both cores fully utilized.
lzss 0.01
(withdrawn) is a free,
experimental file compressor by Ilia Muravyov, Aug. 1, 2008. It uses
LZSS, a byte aligned LZ77 variant with matches encoded with an 18 bit pointer
and 6 bit length field, and 1 bit flags to distinguish matches
from literals. It is discussed here.
Compression options are e (fast) or ex (smaller). The program is designed
for fast decompression. The program uses 625 MB for compression and 33 MB
for decompression.
lzss
0.02
(discussion)
was released Feb. 7, 2014. Options cf, c, cx select fast, medium, and best
compression.
BriefLZ
1.05
is a free, open source (C and MASM) file compressor by Joergen Ibsen,
Jan. 15, 2005. It uses LZ77. It takes no options.
It uses about 2 MB memory for compression and about 900 KB for decompression.
brieflz 1.1.0 was last
updated Sept. 23, 2015. To test, was compiled using the supplied
Makefile (as blzpack) in the example subdirectory of the GitHub distribution using
gcc 4.8.1 in Windows (note 26) and gcc 4.8.4 in Linux (note 48).
lzf 1.01, Oct. 29, 2013, is a performance
optimization with no change in compresion.
lzf 1.02
(discussion)
was released Oct. 2, 2014.
The -C8 option selects the maximum number of contexts, 218.
For this test, the C source code was compiled with MinGW 3.4.5:
Version 0.9 (Oct. 22, 2006) is a faster version (quick.exe)
which handles large (64 bit) files.
Version 1.20 (Mar. 15, 2007) is an archiver rather than a file compressor.
Version 1.30 beta
(Apr. 16, 2007) has 4 modes (0-3) with 4 separate executables.
Only version 3 (quick3.exe, max compression) was tested.
Version 1.30 (Aug. 14, 2007) modes 0, 1, and 2 are compatible with version 1.20,
but mode 3 (best compression) is new.
Version 1.40 (Nov. 13, 2007) is an experimental version designed for better speed.
It has only one mode.
.3196 lz5
1_OO_LL_MMM OOOOOOOO = 10 bit offset
00_LLL_MMM OOOOOOOO OOOOOOOO = 16 bit offset
010_LL_MMM OOOOOOOO OOOOOOOO OOOOOOOO = 24 bit offset
011_LL_MMM = repeat previous offset
MMM codes the match length from 3 to 9. If MMM = 111, then an additional
byte is used to code match lengths of 10 to 265. LL or LLL is the 2 or 3
bit literal length (0..3 or 0..7) following the match.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- --- ----
lz5 1.3.3 -0 49,358,209 433,092,957 138,210 s 433,231,167 8.7 3.9 9 LZ77 48
-18 36,514,408 319,510,433 138,210 s 319,648,643 10578 3.7 1139 LZ77 48
.3211 gzip124hack
gzip124hack
(mirror)
(discussion)
is a modified version of gzip 1.2.4 by Ilia Muraviev, Aug. 13, 2007.
It uses LZ77.
It is a file compressor like gzip, except that it does not delete the input file.
It improves compression by using LZ77 lazy matching with 2 byte lookahead.
The compressed format is compatible with gzip. -9 selects maximum compression.
.3224 doboz
doboz 0.1 is a free,
open source file compressor by Attila T. Áfra, Mar. 18, 2011. It uses LZ77.
It is both a compression library and a simple single-threaded
file compressor which takes no options. To test,
the supplied compressor for 32 and 64 bit Windows was tested. The 32
bit version crashed while compressing enwik9, possibly due to reading the
whole file into memory. The 64 bit version succeeded under Ubuntu/wine.
.3226 gzip
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ----
gzip 1.3.5 -9 36,445,248 322,591,995 34,408 x 322,626,403 55 22 48 (Linux)
gzip 1.3.5 -9 36,445,248 322,591,995 38,801 x 322,630,796 101 17 (Windows)
gzip 1.3.5 36,518,329 323,742,882 38,801 x 323,781,683 85 19
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ----
doboz 0.1 36,367,430 fail 76,471 x 940 10 26
36,367,430 322,415,409 83,591 x 322,499,000 533 3.4 1200 48
.3226 Info-ZIP
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Notes
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- --
Info-ZIP 2.31 (Linux) -9 36,445,373 322,592,120 57,583 x 322,649,703 104 35 0.1 LZ77
Info-ZIP 2.32 (DOS) -9 (unset TZ) 36,445,333 178 101 LZ77 16
Info-ZIP 2.32 (DOS) -9 36,445,351 179 LZ77 16
Info-ZIP 2.32 (Win32) -9 36,445,474 183 LZ77 16
Info-ZIP 2.32 (Win32) -9 36,445,443 322,592,190 75,806 xd 322,667,996 96 13 1.2 LZ77
Info-ZIP 3.00 (Win32) -9 36,445,475 322,592,222 101,079 xd 322,693,301 114 18 1.3 LZ77 26
.3234 pkzip
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ----
pkzip 2.0.4 36,934,712 327,607,376 29,184 xd 327,636,560 123 44 1.7 LZ77
pkzip 2.0.4 -ex 36,556,552 323,403,526 29,184 xd 323,432,710 171 50 2.5 LZ77
.3237 jar
.3244 PeaZip
.3286 arj
Program Options enwik8 enwik9 prog size Total Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ---------- ----------- ---- ----- --- ---- ----
arj 3.10 -m0 100,000,127 12 10 3 store 26
-m1 37,091,317 328,553,982 143,956 x 328,697,938 262 67 3 LZ77 26
-m2 37,381,391 224 68 3 LZ77 26
-m3 39,413,127 185 72 3 LZ77 26
-m4 44,157,478 116 91 3 LZ77 26
.3344 lzgt3a
Program enwik8 enwik9 prog size Total Comp Decomp Mem Alg
------- ---------- ----------- ---------- ----------- ---- ----- --- ----
lzgt 47,560,234 1,989 sd 634 234 2 LZ77
lzgt1 43,928,072 403,385,292 2,025 sd 403,387,317 3390 865 2 LZ77
lzgt2 57,268,099 1,935 sd 982 274 1 LZ77
lzgt3 54,253,334 1,963 sd 889 280 1 LZ77
lzgt3a 37,444,440 334,405,713 4,387 xd 334,410,100 1581 2886 2 LZ77
.3502 pucrunch
pucrunch is a free,
open source file compressor by Pasi Ojala, last updated Mar. 8, 2002.
It uses a combination of run length encoding (RLE) and LZ77 with Elias Gamma coding
of the offsets and run lengths.
The original version was written on Mar. 14, 1997 for the Commodore series
(Vic 20, Commodore 64, Commodore 128 and Commodore Plus 4/C16) in 6510
assembly language, with updates on Dec. 17, 1997 and Oct. 14, 1998.
The 6510 is a 1 MHz, 8 bit microprocessor with 3 registers,
16 bit (64K) address space, no cache, no pipelining, 8 bit ALU, no multiply or
floating point instructions, and no support for multitasking or virtual memory.
The decompresser was designed to execute quickly
in this environment with only a few hundred bytes of memory.
#!/usr/bin/perl
# compress with pucrunch: perl p input output
open(IN,"$ARGV[0]")||die "$!: $ARGV[0]";
open(OUT,">$ARGV[1]")||die "$!: $ARGV[1]";
binmode(IN);
binmode(OUT);
while ($n=read(IN, $s, 64936)) {
open(TMP1,">tmp1")||die "$!: tmp1";
binmode(TMP1);
syswrite(TMP1, $s, $n);
close(TMP1);
`pucrunch -d -c0 tmp1 tmp2`;
open(TMP2,"tmp2")||die "$!: tmp2";
binmode(TMP2);
$size=(stat(TMP2))[7];
print("$n -> $size\n");
$n=read(TMP2,$s,$size);
printf(OUT "%c%c%s", $size/256, $size%256, $s);
close(TMP2);
}
#!/usr/bin/perl
# unpack with pucrunch: perl up input output
open(IN,"$ARGV[0]")||die "$!: $ARGV[0]";
open(OUT,">$ARGV[1]")||die "$!: $ARGV[1]";
binmode(IN);
binmode(OUT);
while (($c1=getc(IN)) ne "") {
$c2=getc(IN);
$size=unpack("C",$c1)*256+unpack("C",$c2);
$n=read(IN, $s, $size);
if ($size!=$n) {die "size=$size n=$n\n";}
open(TMP1,">tmp1")||die "$!: tmp1";
binmode(TMP1);
syswrite(TMP1, $s, $n);
close(TMP1);
`pucrunch -u tmp1 tmp2`;
open(TMP2,"tmp2")||die "$!: tmp2";
binmode(TMP2);
read(TMP2,$s,2);
read(TMP2,$s,64936);
printf(OUT "%s", $s);
close(TMP2);
}
.3619 packARC
packARC v0.7RC11
(discussion)
is a free, open source (GPL v3) archiver by Matthias Stirner, Dec. 7, 2013.
It incorporates packJPG (JPEG compressor), packMP3 (MP3 compressor) and
packPNM (BMP, PPM, PGM, PBM image compressor). Other file times are compressed
with a simple context model and arithmetic coder. Option -sfx creates a
self extracting archive. Option -np tells the program not to pause when done.
For this test, the source was compiled with MinGW g++ 4.8.0 using the supplied
buil_packarc.bat for 32 bit Windows.
.3626 urban
urban is an open
source file compressor for Unix by Urban Koistinen, Apr. 30, 1991.
The program is an order-2 indirect context model with bitwise arithmetic coding.
A hash of the last two whole bytes plus the previously coded bits
of the current byte (MSB first) are mapped to a hash table of size 710123.
Each table element contains a count of 0s and 1s in the range 0 through 8,
and a hash verification consisting of a second hash. When a collision
is detected, the counts are reset to 0. Otherwise, the appropriate count is
incremented and both are halved if either exceeds 8.
.3663 lzop
.3676 lzw
lzw v0.1 is a free, experimental
file compressor by Ilia Muraviev, Jan. 30, 2008. It uses LZW with 16 bit
code words. It takes no options.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
lzw 0.1 42,554,530 380,782,976 42,215 x 380,825,191 1917 27 17 LZW
lzw 0.2 41,960,994 367,633,910 671 s 367,634,581 3597 31 18 LZW
.3701 MTCompressor
.3721 lz4x
Compresor Opt enwik8 enwik9 prog Total Comp Deco Mem ALg Note
--------- --- ---------- ----------- ------- ----------- ---- ---- --- --- ----
lz4opt 1.00 cf 50,052,286 444,844,266 48,445 x 444,892,711 5.6 18 LZ77 68
c 44,815,112 397,492,322 48,445 x 397,540,767 11.4 22 LZ77 68
cb 41,950,671 372,074,748 48,445 x 372,123,193 206 1.5 122 LZ77 68
lz4x 1.02 c1 52,653,040 472,784,650 48,609 x 472,833,259 8.6 3.5 19 LZ77 48
c2 44,182,671 392,104,176 48,609 x 392,152,785 30 3.2 19 LZ77 48
c3 42,833,452 379,633,926 48,609 x 379,682,535 47 3.2 19 LZ77 48
c4 41,950,112 372,068,437 48,609 x 372,117,046 136 3.3 114 LZ77 48
c4 41,950,112 372,068,437 48,609 x 372,117,046 79 1.4 LZ77 68
.3790 arbc2z
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg
------- ------- ---------- ----------- ----------- ----------- ----- ----- --- ---
arbc2z 38,756,037 379,054,068 6,255 sd 379,060,323 2659 2674 68 PPM2
arbc2 38,780,256 379,093,120 6,070 sd 379,099,190 2528 2646 67 PPM2
arbc1 48,586,591 486,892,000 6,047 sd 486,898,047 2439 2611 1.8 PPM1
arbc0 63,501,994 644,561,590 5,988 sd 644,567,578 2459 2606 1.5 o0
.3800 lz4
Compressed size Decompresser Total size Time (ns/byte)
Program Opt enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- --- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
lz4 0.2 55,092,932 488,028,718 9,556 x 488,038,274 13 7 13 LZ77 26
lz4 0.6 55,062,753 487,772,940 42,139 x 487,815,079 14 7 13 LZ77 26
lz4hc 0.9 44,182,558 392,102,544 43,617 x 392,146,161 65 7 14 LZ77 26
lz4 1.2 -c0 54,303,743 481,142,522 49,128 x 481,191,650 15 6 20 LZ77 26
-c1 44,218,551 392,460,229 49,128 x 392,509,357 69 6 21 LZ77 26
-c2 42,870,164 379,999,522 49,128 x 380,048,650 91 6 20 LZ77 26
.3802 lzss
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---- ---- ----
lzss 0.01 e 48,615,051 426,009,994 44,555 x 426,054,549 193 15 625 LZSS
ex 38,254,303 337,565,308 44,555 x 337,609,863 9708 14 625 LZSS
lzss 0.02 cf 50,110,565 448,712,956 48,114 x 448,761,070 22 12 17 LZSS 26
cf 448,712,956 48,114 x 448,761,070 6.0 17 LZSS 63
c 45,093,733 399,850,630 48,114 x 399,898,744 40 11 17 LZSS 26
c 399,850,630 48,114 x 399,898,744 12.5 17 LZSS 63
cx 42,874,387 380,192,378 48,114 x 380,240,492 265 10 145 LZSS 26
cx 42,874,387 380,192,378 48,114 x 380,240,492 107 2.3 145 LZSS 63
.3894 xdelta
xdelta 3.0u is a free, open source command line
file compressor by Joshua McDonald, Oct. 12, 2008. It uses LZ77. The program is a delta
coder, meaning it will output the compressed difference between two files, and then
decompress the second file when given the first file uncompressed. It allows the first
file to be omitted, in which case it simply compresses. This is how the test was done.
-9 specifies maximum compression.
.3901 BriefLZ
Version enwik8 enwik9 prog size Total Comp Decomp Mem Alg Note
------- ---------- ----------- ---------- ----------- ---- ----- --- ---- ----
BriefLZ 1.05 46,638,341 425,384,313 5,298 x 425,389,611 66 18 2 LZ77
blzpack 1.1.0 43,300,800 390,122,722 14,907 s 390,137,629 29 15 4 LZ77 26
blzpack 1.1.0 43,300,800 390,122,722 14,907 s 390,137,629 21 7.5 3 LZ77 48
.3972 mtari
mtari 0.2 is a free, open source (GPL v3) file compressor by
David Werecat, Dec. 10, 2013. It is a multi-threaded bitwise order 17 context model
with arithmetic coding.
To test, it was compiled with MinGW gcc 4.8.0 with options -O2 -fopenmp.
.4068 lzf
lzf v1.00
(discussion)
is a free, experimental file compressor by Ilya Muravyov,
Oct. 29, 2013. It uses byte aligned LZ77 with a 8 KB window. Commands
c and cx give faster or better compression, respectively.
Version enwik8 enwik9 prog size Total Comp Decomp Mem Alg Note
------- ---------- ----------- ---------- ----------- ---- ----- --- ---- ----
lzf 1.00 c 48,947,532 440,862,551 47,737 x 440,910,288 39 12 18 LZ77 26
c 48,947,532 440,862,551 47,737 x 440,910,288 8 LZ77 60
cx 46,318,130 416,377,741 47,737 x 416,425,478 53 11 18 LZ77 26
cx 46,318,130 416,377,741 47,737 x 416,425,478 14 2.3 LZ77 60
lzf 1.01 c 48,947,532 440,862,551 47,728 x 440,910,279 39 12 18 LZ77 26
c 48,947,532 440,862,551 47,728 x 440,910,279 8 LZ77 60
cx 46,318,130 416,377,741 47,728 x 416,425,469 49 11 18 LZ77 26
cx 46,318,130 416,377,741 47,728 x 416,425,469 12 2.3 LZ77 60
lzf 1.02 c 47,827,133 430,634,000 48,359 x 430,682,359 16 3.9 22 LZ77 48
c 47,827,133 430,634,000 48,359 x 430,682,359 7 LZ77 68
cx 45,198,298 406,805,983 48,359 x 406,854,342 110 3.7 151 LZ77 48
cx 45,198,298 406,805,983 48,359 x 406,854,342 68 2.2 LZ77 68
.4092 srank
srank 1.1 is a free,
open source file compressor by P. M. Fenwick, originally written Sept. 5, 1996
and last updated Apr. 10, 1997. It uses symbol ranking, like MTF (move to front)
in BWT, but in order 3 contexts without a BWT transform. When a symbol is encountered
it is encoded with 1, 3, or 4 bits according to its position in a queue of length 3,
then moved to the front. Long runs of first place symbols are run length encoded
using 12 bits to encode the length of the length of the run.
A miss is coded using pseudo-MTF in an order-0 context using 7 bits for
the first 32 symbols and 12 bits for the rest. It is pseudo-MTF because after a
symbol is found it is swapped with another symbol about half way to the front,
with some dithering. The algorithm is designed for speed rather than good compression.
gcc -O2 -march=pentium4 -fomit-frame-pointer -s srank.c -o srank.exe
.4106 QuickLZ
QuickLZ v0.1 is an open source (GPL)
compression library designed for high speed by Lasse Mikkel Reinhold,
Sept. 24, 2006. Tests were performed with demo.exe. Speed is I/O bound.
Times shown are process times, but wall times can be 2-4 times greater.
On enwik9 compression, the program reports "file too big".
Version enwik8 enwik9 prog size Total Comp Decomp Mem Alg
------- ---------- ----------- ---------- ----------- ---- ----- --- ----
QuickLZ 0.1 57,331,969 (fails) 45,361 x 19 21 154 LZ77
QuickLZ 0.9 56,900,177 507,806,141 45,086 x 507,851,227 11 11 10 LZ77
QuickLZ 1.20 57,147,067 510,018,447 43,501 x 510,061,948 17 12 2 LZ77
quick3 1.30b 46,378,438 410,633,262 44,202 x 410,677,464 48 12 3 LZ77
QuickLZ 1.30 -3 46,445,704 411,493,051 47,304 x 411,540,355 49 12 2 LZ77
-2 51,941,357 23 11
-1 57,153,015 12 11
-0 52,803,919 20 16
quickLZ 1.40 47,728,849 417,653,684 43,922 x 417,697,606 28 13 13 LZ77
.4165 stz
stz 0.7.2
is a free, experimental file compressor by Bruno Wyttenbach, Feb. 15, 2011.
It uses LZ77. It has 4 compression modes as shown in the table below. Times are process
times. Real times are closer to 40-45 seconds. Memory is 3.3. MB for all compression modes
and the same for decompression. Most of the memory is for I/O buffers (2MB each). The actual algorithm
uses 48 KB. Modes -c and -c3 compress to the same size but the archives
differ by 1 byte in the header. stz.exe zip size is 40,425.
stz 0.8, Mar. 4, 2011, improves compression and adds two new experimental modes. Compression and decompression process times in ns/byte are given below for both enwik8 and enwik9. Wall times are slower due to disk I/O. Modes -c, -c1, and -c2 select best compression speed, best uncompression speed, and best size respectively, but this appears only to hold for enwik8, probably because of disk I/O interference. Modes -c3, -c4, and -c5 produce identical archives. Additional changes are a Drag'n'drop interface, a CRC check (adds 2% to time), and more flexible command line interface. 5313_stz.zip size is 41,941.
Version Option enwik8 C/D Time enwik9 C/D Time Mem Note ------- ------------------------------------ ---------- ------- ----------- --------- --- ---- stz 0.7.2 -c (LZBW2 best compression speed) 50,575,825 447,732,354 15 13 3 26 -c1 (LZBW3 best uncompression speed) 56,100,810 510,600,276 16 10 3 26 -c2 (LZBW2A best compression) 47,681,682 420,391,400 16 12 3 26 -c3 (LZBW3A experimental) 50,575,825 447,732,354 15 11 3 26 stz 0.8 -c (LZBW2 best compression speed) 50,143,263 11 11 444,061,128 16 13 3 26 -c1 (LZBW3 best uncompression speed) 55,670,417 16 9 506,622,114 18 12 3 26 -c2 (LZBW2A best compression) 47,192,312 16 11 416,524,596 14 13 3 26 -c3 (LZBW3A) 54,080,795 15 11 480,696,931 18 12 3 26 -c4 (LZBW2B experimental) 54,080,795 13 9 480,696,931 20 13 3 26 -c5 (LZBW3B experimental) 54,080,795 16 12 480,696,931 19 14 3 26
compress 4.3d is is the Windows version of the UNIX compress
command, released Jan 18, 1990. It uses LZW and has no compression options.
.4382 lzrw3-a
lzrw3-a is one of a series
of public domain (open source) memory to memory compressors by
Ross Williams in 1991. The programs were
implemented
as file compressors by Matt Mahoney on Feb. 14, 2008. The programs
are as follows:
lzrw1 (Mar. 31, 1991) is byte-aligned LZ77 with a 12 bit offset and 4 bit length field allowing lengths 3-16. Each group of 16 phrases (pointers or literals) is preceded by 2 flag bytes to distinguish pointers from literals. Matches are found using a 4K hash table without confirmation which is updated after each phrase. It uses 16K of memory plus the input and output buffers.
lzrw1-a (June 25, 1991) is lzrw1 except that the length field represents values 3-18.
lzrw2 (June 29, 1991) replaces the offset with a 12 bit index into a rotating table of offsets, allowing the last 4K phrases (rather than 4K bytes) to be reached. The decompresser must reconstruct the phrase table (but not the hash table). It uses 24K memory plus buffers.
lzrw3 (June 30, 1991) replaces the 12 bit length field with a 12 bit index into the hash table. The decompresser must reconstruct the hash table. It uses 16K memory plus buffers.
lzrw3-a (July 15, 1991) uses a deep hash table (8 offsets per hash) with LRU replacement. It uses 16K memory plus buffers.
lzrw5 (July 17, 1991) uses LZW. The dictionary is implemented as a tree. It uses up to 384K memory plus buffers.
There is an experimental lzrw4, but it was never fully implemented.
All of the compression algorithms were originally implemented as memory to memory compression functions in C, not as complete programs. I wrote a driver program which divides the input into 1 MB blocks (except lzrw5), compresses them independently by calling the provided functions, and writing the compressed size as a 4 byte number followed by the compressed data. However, compression could be improved by using larger blocks at the cost of more memory. For lzrw5 the block size is 64K because the program is not guaranteed to work correctly for larger blocks. It did work on this benchmark for a 192K block size, but not for 256K. The distribution linked above uses a 64K block size.
Compressor enwik8 enwik9 prog Total Comp Deco Mem ALg ------- ---------- ----------- ------- ----------- ---- ---- --- --- lzrw1 59,692,493 564,053,011 3,142 s 564,056,153 24 17 2 LZ77 lzrw1-a 59,471,657 560,457,545 4,328 x 560,461,873 23 15 2 LZ77 lzrw2 55,360,907 511,142,568 4,420 x 511,146,988 22 16 2 LZ77 lzrw3 52,616,827 483,918,830 4,622 x 483,923,452 21 17 2 LZ77 lzrw3-a 48,009,194 438,253,704 4,750 x 438,258,454 38 17 2 LZ77 lzrw5 (64K) 59,375,192 570,387,858 4,544 x 570,392,402 146 14 1 LZW lzrw5 (192K) 50,721,610 479,044,732 174 14 1 LZW
Compressor enwik8 enwik9 prog Total Comp Deco Mem ALg ------- ---------- ----------- ------- ----------- ---- ---- --- --- fcm1 45,402,225 447,305,681 1,116 s 447,306,797 228 261 1 CM1
runcoder1
is a free, open source (GPL) file compressor by Andrew Polar, Mar. 30, 2009.
It uses an order 1 model with arithmetic coding. It takes no options.
The program is available as source code (C++) only. For this test
it was compiled with MinGW g++ 3.4.2 with options -O2 -march=pentiumpro
-fomit-frame-pointer -s for 32-bit Vista as noted in note 26.
.4598 data-shrinker
data-shrinker is a free, open
source file compressor by Siyuan Fu, Mar. 23, 2012. It uses a LZ77 format
similer to LZ4 for high speed. It takes no options. No executable was provided.
To test, the source code was compiled with g++ 4.5.1 -O3 -s under 32 bit Windows
and process times measured with output to nul:
Compressor Version Opt enwik8 enwik9 prog Total Comp Deco Mem Alg Note ---------- --------- --- ---------- ----------- ------- ----------- ---- ---- --- ---- ---- data-shrinker 23Mar2012 51,658,517 459,825,318 3,706 s 459,829,024 14 4 2 LZ77 26
lzwc 0.3 is a free, open source (GPL) file compressor by David Catt, Jan. 15, 2013. It uses LZW with dictionary entries coded using 2 bytes. There is also a version 0.1 which produces identical compressed files but is not as fast. The program takes no options.
lzwc v0.7 fixes a bug in decompression of binary files, but does not change compressed size or speed. lzwc_bitwise is a version that uses less than 16 bits to encode symbols when the dictionary is small.
Compressor enwik8 enwik9 prog Total Comp Deco Mem Alg Note ---------- ---------- ----------- ------- ----------- ---- ---- --- ---- ---- lzwc 0.1 46,647,318 1,955 x 280 290 70 LZW 26 lzwc 0.3 46,647,318 463,892,454 3,017 x 463,895,471 85 90 71 LZW 26 lzwc_bitwise 0.7 46,639,414 463,884,550 4,183 x 463,888,733 123 134 71 LZW 26
exdupe v0.3.3 beta is a deduplicating archiver supporting full and incremental backups, under development by Lasse Reinhold, Oct. 20, 2011. When the beta phase ends, it will be a commercial program with source code available under restricted and non-permissive terms. Only 64 bit systems are supported. Partial source code is available for this version, although not for the compression and decompression code, which is derived from QuickLZ (LZ77). It was tested in Linux. A later version, 0.3.6 beta, was available only for 64 bit Windows on Oct. 30, 2012, and was not tested.
Compressor Opt enwik8 enwik9 prog Total Comp Deco Mem ALg Note ---------- --- ---------- ----------- ------- ----------- ---- ---- ---- ---- ---- exdupe 0.3.3 53,717,422 478,788,378 1,092,986 x 479,881,364 27 5 1000 LZ77 48
Compressor Opt enwik8 enwik9 prog Total Comp Deco Mem ALg Note ---------- --- ---------- ----------- ------- ----------- ---- ---- ---- ---- ---- lzv 0.1.0 54,950,847 488,436,027 10,385 x 488,446,412 6 5 3 LZ77 62 lzv 0.1.0 54,950,847 488,436,027 10,385 x 488,446,412 15 6 3 LZ77 26 lzv 0.1.0 54,950,847 488,436,027 10,385 x 488,446,412 4 2.6 3 LZ77 48
FastLZ is a free, open source compression library and file compressor by Ariya Hidayat, announced June 12, 2007 with no date or version number, and downloaded and tested on June 16, 2007. It uses byte-aligned LZ77. The software was released as source code only (in C). For this test it was compiled with MinGW gcc 3.4.5 as suggested by README.TXT (plus -s to strip debugging info):
gcc -march=pentium -O3 -fomit-frame-pointer -mtune=pentium 6pack.c fastlz.c -o 6pack -s gcc -march=pentium -O3 -fomit-frame-pointer -mtune=pentium 6unpack.c fastlz.c -o 6unpack -s6pack and 6unpack are the compressor and decompresser, respectively. They take no options. The compressed file name is stored without a path in the archive.
sharc 0.9.10 was released Dec. 12, 2013.
sharc 0.9.11b, Dec. 14, 2013 has compression levels -c1 and -c2. -c0 selects no compression. -c1 selects dictionary encoding. -c2 selects LZP preprocessing followed by dictionary coding. The program uses the Density 0.9.12b compression library which is now a separate component.
Compressor Opt enwik8 enwik9 prog Total Comp Deco Mem Alg Note ------- ---- ---------- ----------- ------- ----------- ---- ---- --- --- ---- sharc 0.9.6 -c0 63,290,900 625,090,400 25,822 s 625,116,222 14 11 14 Dict 26 sharc 0.9.6 -c1 58,612,834 554,587,996 25,822 s 554,613,818 19 15 14 Dict 26 sharc 0.9.10 -c0 61,798,570 610,691,896 11,765 s 610,703,661 13 11 4 Dict 26 sharc 0.9.10 -c1 57,031,766 538,757,716 11,765 s 538,769,481 14 15 5 Dict 26 sharc 0.9.11b -c1 61,611,730 608,740,104 81,001 s 608,821,105 12 9 5 Dict 26 sharc 0.9.11b -c2 53,175,042 494,421,068 81,001 s 494,502,069 15 14 6 LZP 26
flzp v1 is a free,
open source file compressor by Matt Mahoney, June 18, 2008. It uses byte-oriented LZP.
The input is divided into blocks such that at least 33 byte values never occur, or 64KB,
whichever is smaller, then uses those bytes to code an end of block symbol plus match
lengths from 2 up to the number of unused bytes - 1. A match length is decoded by
finding the most recent context hash match in a 4 MB rotating buffer and outputting
the bytes that follow. It uses a 1M hash table and an order 4 context hash.
Each block begins with a 32 byte bitmap to distinguish symbols for matches from literals.
flzp can be used as a preprocessor to a low order compressor like fpaq0 or ppmd -o3
to improve compression and speed.
alba 0.1
is a free, open source, experimental file compressor by xezz, Feb. 4, 2014,
updated Feb. 5, 2014 to fix a bug in the "C" option.
It uses byte pair encoding. The option c32768 selects the maximum block size.
The default is 4096. It has an "optimal" compression mode "C".
It was tested in Linux by compiling with gcc 4.8.1 -O3.
alba 0.2, Feb. 6, 2014, adds extreme (e) mode. Modes c and C are unchanged.
alba 0.5.1, Feb, 18, 2014, adds dynamic block sizing (cd).
.5157 alba
Compressor Opt enwik8 enwik9 prog Total Comp Deco Mem Alg Note
------- ---- ---------- ----------- ------- ----------- ---- ---- --- --- ----
alba 0.1 c 53,643,211 526,932,392 2,950 s 526,935,342 219 10 1 BPE 48
c32768 57,419,643 548,461,196 2,950 s 548,464,146 171 8 1 BPE 48
C 53,618,232 526,577,702 2,880 s 526,580,582 227 14 1 BPE 48
C32768 57,395,415 547,792,821 2,880 s 547,795,701 179 12 1 BPE 48
alba 0.2 e 53,611,841 526,860,426 3,247 s 526,863,673 819 603 1 BPE 48
alba 0.5.1 cd 52,728,620 515,760,096 4,870 s 515,764,966 239 10 4 BPE 48
.5229 lzpgt6
lzpgt is a free, experimental compressor
by Gerald R. Tamayo, Aug. 23, 2022. It uses LZP. It outputs a block of bits to flag
whether the next byte was predicted correctly, followed
by a block of the missed literal bytes.
lzpgt6 was released Aug. 9, 2023. It increases the prediction/guess table from 20 to 21 bits and some speed optimizations.
Compressor Opt enwik8 enwik9 prog Total Comp Deco Mem Alg Note ------- ---- ---------- ----------- ------- ----------- ---- ---- --- --- ---- lzpgt 56,590,342 529,409,256 24,576 x 529,433,832 7 9 2 LZP 95 lzpgt6 56,113,248 522,877,083 27,136 x 522,904,219 6 5 6 LZP 95
snappy 1.0.1 is a free, open source (Apache) compression library for Linux from Google, Mar. 25, 2011. It uses byte aligned LZ77, and is intended for high speed rather than good compression. Google uses snappy internally to compress its data structures for its search engine.
The compressed data contains tag bytes such that the low 2 bits indicate literals and matches as follows:
00 = literal 01 = 1 byte match 10 = 2 byte match 11 = 4 byte match (not used)
A literal of length 1 to 60 is encoded by storing the length - 1 in the upper 6 bits. Longer literals are coded by storing 60..63 in the upper 6 bits to indicate that the length is encoded in the next 1 to 4 bytes in little-endian (LSB first) format. This is followed by the uncompressed literals.
Matches of length 4 to 11 with offsets of 1 to 2047 are encoded using a 1 byte match. The match length - 4 is stored in the middle 3 bits of the tag byte. The most significant 3 bits of the offset are stored in the most significant 3 bits of the tag byte. The lower 8 bits of the offset are stored in the next byte. A match may overlap the area to be copied. Thus, the string "abababa" could be written using a literal "ab" and a match with an offset of 2 and length of 5. This would be encoded as:
000001 00 (literal of length 2) 01100001 (literal 'a') 01100010 (literal 'b') 000 001 01 (high bits of offset, match of length 5) 00000010 (low 8 bits of offset)
Matches of length 1 to 64 with offsets of 1 to 65535 are encoded using a 2 byte match. The length - 1 is encoded in the high 6 bits of the tag byte The offset is stored in the next 2 bytes with the least significant bit first. Longer matches are encoded as a series of 64 byte matches with a final shorter match of 4 to 63. If the final part of the match is less than 4 then it is encoded as a 60 byte match plus a 4 to 7 byte match.
A 4 byte match allows offsets up to 232 - 1 to be encoded as with a 2 byte match. The decompresser will decode them but the compressor does not produce them because the input is compressed in 32K blocks such that a match does not span a block boundary.
The entire sequence of matches and literals is preceded by the uncompressed length up to 232 - 1 written in base 128, LSB first, using 1 to 5 digits in the low 7 bits. The high bit is 1 to indicate that more digits follow.
Compression searches for matches by comparing a hash of the 4 current bytes with previous occurrences of the same hash earlier in the 32K block. The hash function interprets the 4 bytes as a 32 bit value, LSB first, multiplies by 0x1e35a7bd, and shifts out the low bits. The hash table size is the smallest power of 2 in the range 256 to 16384 that is at least as large as the input string. As an optimization for hard to compress data, after 32 failures to find a match, the compressor checks only every second location in the input for the next 32 tests, then every third for the next 32 tests, and so on. When it finds a match, it goes back to testing every location.
As another optimization for the x86-64 architecture, copies of 16 bytes or less are done using two 64-byte assignments rather than memcpy(). To support this, if 15 or fewer bytes remain after a match then they are encoded as literals with no further search.
Snappy compresses from memory to memory rather than from file to file, so it was necessary to write a small test program (below), which was not included in the compressed size. The program loads the input into a string, compresses or decompresses it to a new string, and writes it to output. It gives the best possible compression but is not optimal for speed or memory. With this test, speed is 25 ns/byte for compression and 12 ns/byte for decompression (under 64 bit Linux). In a separate test (not shown), compressing in 32K chucks takes 9 ns/byte with very slightly larger size due to storing the size in each chunk. Decompression was not tested in this mode, but should be twice as fast. Memory usage for the test program is 2 GB to store the input and output, but actual memory usage by the library is at most 32K for the hash table.
The test program was compiled with g++ 4.4.5 -O3 in 64 bit Ubuntu Linux and linked to Snappy after running "./configure; make". Use -DMODE=Compress or -DMODE=Uncompress to create a compressor or decompresser respectively.
#define NDEBUG 1 // turn off debugging checks #include "snappy.h" #include <stdio.h> int main() { std::string input, output; int c; while ((c=getchar())!=EOF) input+=char(c); // read from stdin snappy::MODE(input.c_str(), input.size(), &output); // MODE = Compress or Uncompress fwrite(output.c_str(), 1, output.size(), stdout); // write to stdout return 0; }
For testing, I compiled with gcc 4.4.0 -s -O2 -march=pentiumpro
-fomit-frame-pointer. I used the recommended compression options
"5000 4096 200 3" and did not try to find a better combination.
The options say to use a maximum block size of 5000, a hash table
size of 4096 (it is recommended to be 5% to 20% smaller than the block
size), a maximum of 200 different byte values per block, and do not
replace pairs that occur less than 3 times.
kwc
(discussion)
is a free GUI file compressor by sportman, Jan. 18, 2010. The input is divided into
strings of 6 bytes each, and each value is replaced with a dictionary code. The dictionary
size is not bounded, so usage increases with the size and randomness of the input.
enwik9 uses 668 MB for compression and 333 MB for decompression.
bpe2 v2, Jan. 15, 2010,
uses a faster algorithm to find the most
frequent byte pair during compression.
bpe2 v3, Feb. 12, 2010,
has some optimizations.
(discussion)
The programs were tested by compiling with g++ 4.4.0 -O2 -s -march=pentiumpro -fomit-frame-pointer
under Windows Vista on a 2.0 GHz T3200.
.5326 kwc
.5427 bpe2
bpe2 v1 is a free, experimental, open source (public domain)
file compressor by Will, Jan. 15, 2010. It uses byte pair encoding. It divides
the input into blocks of 8192 bytes which are compressed independently. A block
is compressed by finding the byte pair which occurs most frequently and a byte
value which never occurs in the block, and then substituing that byte value
for each occurrence of the pair. The byte pair and its replacement are appended
to the block as a 3 byte header. The process is repeated until either there
are no unused byte values left, or there is no pair that occurs at least 4 times.
The block is output with an additional 2 byte header to indicate its size.
Compression Compressed size Decompresser Total size Time (ns/byte)
Program enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem Alg Note
------- ---------- ----------- ----------- ----------- ----- ----- --- --- ----
bpe2 v1 55,390,822 545,319,505 1,621 s 545,321,126 2785 228 0.5 Dict 26
bpe2 v2 55,389,832 545,268,425 1,635 s 545,270,060 1257 229 0.5 Dict 26
bpe2 v3 55,289,197 542,748,980 2,979 s 542,751,959 518 132 0.5 Dict 26
.5586 fpaq0f2
fpaq is
a free, experimental command line file compressor with source code
(in assembler) by Nikolay Petrov, Feb. 20, 2006. It is a faster
implementation of fpaq0 by Matt Mahoney (Sept. 3, 2004) maintaining
archive compatibility. fpaq is an order-0 arithmetic coder which
models independent, identically distributed (i.i.d.) characters, and is not
intended as a general purpose compressor. Its purpose is
to test the efficiency of different arithmetic coding algorithms.
There are several variants.
Compressor enwik8 enwik9 Comp Decomp Author Date ---------- ---------- ---------- ---- ---- -------------- ---- fpaq0 63,391,013 641,421,110 336 351 Matt Mahoney Sep 03 2004 fpaq1 63,502,003 477 489 Matt Mahoney Jan 10 2006 fpaq0b 63,375,460 457 437 Fabio Buffoni Jan 10 2006 fpaq0s 63,375,457 427 417 David A. Scott Jan 16 2006 fpaq 63,391,013 641,421,110 255 246 Nicolay Petrov Feb 20 2006 fpaq0p 61,457,810 622,237,009 131 131 Ilia Muraviev Apr 15 2007 fpaq02 63,501,997 644,561,596 1345 1325 David Anderson May 27 2007 fpaqa 61,340,408 620,681,885 262 237 Matt Mahoney Dec 15 2007 fpaqb 61,270,458 620,278,361 264 171 Matt Mahoney Dec 20 2007 fpaq0m 61,389,879 621,285,504 153 135 Ilia Muraviev Dec 20 2007 fpaq0mw 61,271,869 618,959,309 455 457 Eugene Shelwien Dec 21 2007 fpaqc 61,270,455 620,278,358 252 177 Matt Mahoney Dec 24 2007 fpaq0pv2 61,280,398 620,379,449 116 133 Ilia Muraviev Dec 26 2007 fpaq0r 61,234,684 620,169,855 129 142 Alexander Rhatushnyak Jan 09 2008 fpaq0rs 61,202,171 619,839,546 139 138 Alexander Rhatushnyak Jan 09 2008 fpaq0f 58,088,230 581,053,251 265 251 Matt Mahoney Jan 28 2008 fpaq0f2 56,916,872 558,645,708 222 207 Matt Mahoney Jan 30 2008 fpaq0pv3 61,457,810 622,237,009 103 119 Nania Francesco Antonio Apr 04 2008 fpaq0pv4 61,457,810 622,237,009 70 79 Eugene Shelwien Apr 06 2008 fpaq0pv4nc 61,350,834 621,169,159 64 69 Eugene Shelwien Apr 06 2008 fpaq0pv4nc0 61,287,662 620,506,072 68 74 Eugene Shelwien Apr 06 2008 fpaq0pv5 61,457,810 622,237,009 81 87 Nania Francesco Antonio Apr 06 2008 fpaq0pv4a 61,457,810 622,237,009 70 75 Eugene Shelwien Apr 07 2008 fpaq0pv4anc 61,323,986 621,169,159 64 65 Eugene Shelwien Apr 07 2008 fpaq0pv4anc0 61,287,662 620,506,072 66 66 Eugene Shelwien Apr 07 2008 fpaq0pv4b1 61,287,234 620,488,244 56 60 Eugene Shelwien Apr 18 2008
fpaq0 uses a 32-bit carryless arithmetic coder to code binary decisions and output one byte at a time. fpaq1 uses a 64 bit coder. fpaq0b uses a 32 bit coder but counts carries and outputs a bit at a time to achieve greater internal precision. fpaq0s improves on fpaq0b by using the compressed EOF to encode the uncompressed EOF, unlike the other models which code an extra bit for each byte to indicate the end. fpaq02 extends this idea to 64 bits. All programs except fpaq are C++ source code and compiled as follows with MinGW 3.4.2 (where %1 is the program name):
g++ -Wall %1.cpp -O2 -Os -march=pentiumpro -fomit-frame-pointer -s -o %1.exe
fpaq0p by Ilia Muraviev, Apr. 15, 2007, uses an adaptive order 0 model. Instead of keeping a 0,1 count for each context, it keeps a probability and updates it by adjusting by 1/32 of the error. This is faster because it avoids a division instruction.
fpaqa by Matt Mahoney, Dec. 15, 2007, is the first implementation of Jarek Duda's asymmetric binary coder, described in section 3 of Optimal encoding on discrete lattice with translational invariant constrains using statistical algorithms, 2007.
The model is based on fpaq0p (adaptive order 0), but with probabilities modeled with 16 bits resolution (instead of 12) to improve compression. The source (GPL) can be compiled with -DARITH to substitute the arithmetic coder from fpaq0 and fpaq0p for the asymmetric coder.
An asymmetric coder has a single N-bit integer state variable x, as opposed to two variables (low and high) in an arithmetic coder, which allows a lookup table implementation. In fpaqa, N=10. A bit d (0 or 1) with probability q = P(d = 1) (0 < q < 1, a multiple of 2-N) is coded:
if d = 0 then x := ceil((x+1)/(1-q)) - 1 if d = 1 then x := floor(x/q)To decode, given x and q
d = ceil((x+1)*q) - ceil(x*q) (1 if fract(x*q) >= 1-q, else 0) if d = 0 then x := x - ceil(x*q) if d = 1 then x := ceil(x*q)x is maintained in the range 2N to 2N+1-1 by writing the low bits of x prior to encoding d and reading into the low bits of x after decoding. Because compression and decompression are reverse operations of each other, they must be performed in reverse order. The encoder divides the input into blocks of size B=500K bits, saves the predictions (q) in a stack, then encodes the bits in reverse order to a second stack. The block size and final state x are then written, followed by the compressed bits in the second stack in reverse order that they were coded. The decompresser runs everything in the forward direction, reading the saved x at the beginning of each block.
To reduce the size of the coding tables, q is quantized to R=7 bits on a nonlinear scale with closer spacing near 0 and 1. The quantization is such that ln(q/(1-q)) is a multiple of 1/8 between -8 and 8.
In the source, N, R, and B are adjustable parameters up to N=12, R=7. Larger values improve compression at the expense of speed and memory. fpaqa uses 2N+R+2 + 5*B/4 bytes for compression and 2N+R+1 bytes for decompression.
fpaqb (Matt Mahoney, Dec. 17, 2007, updated to ver 2 on Dec. 20, 2007) is a revision of fpaqa, using the same model, but using an asymmetric coder that uses direct calculations in place of lookup tables to update the state. This allows higher precision to improve compression (eliminating a 0.03% penalty), saving memory, and allowing bytewise I/O (x in range 2N to 2N+8-1 for N=12). Compression is about the same speed as fpaqa but decompression is 28% faster. Ver. 2 is faster but maintains archive compatibility with ver. 1.
fpaq0m by Ilia Muraviev, Dec. 20, 2007, uses arithmetic coding and 2 order 0 models averaged together, one with fast update (rate 1/16) and one slow (1/64).
fpaq0mw by Eugene Shelwien, Dec. 21, 2007, modifies fpaq0m by using a weighted mix of a fast (1/16) and slow (1/256) adapting order 0 model, where the weight is adjusted dynamically to favor the better model.
fpaqc (Matt Mahoney, Dec. 24, 2007) is fpaqb with some optimizations to the asymmetric coder.
fpaq0pv2 (Ilia Muraviev, Dec. 26, 2007) is a speed optimized version of fpaq0p with arithmetic coding.
fpaq0r by Alexander Rhatushnyak, Jan. 9, 2008, is an order 0 model with arithmetic coding. The model is tuned for better text compression. When compiled with -DSLOWER (fpaq0rs.exe), the arithmetic coder uses higher precision for better compression with a small speed penalty.
fpaq0f by Matt Mahoney, Jan. 28, 2008, uses an adaptive order 0 model which includes the bit history (as an 8 bit state) in each context. (It is controversial whather this is really "order 0"). It uses arithmetic coding with 16 bit probabilities (rather than 12 bits).
fpaq0f2 by Matt Mahoney, Jan. 30, 2008, uses a simplified bit history consisting of just the last 8 bits, plus some minor improvements.
fpaq0pv3 by Nania Francesco Antonio, Apr 04, 2008, is compatible with fpaq0p but 20-30% faster.
fpaq0pv4 including fpaq0pv4nc and fpaq0pv4nc0, are speed optimizations by Eugene Shelwien, Apr. 6, 2008, as discussed here. fpaq0pv4 is compatible with fpaq0p but faster. The nc and nc0 variants dispense with the extra EOF flags in each byte.
fpaq0pv5 by Nania Francesco Antonio, Apr 6, 2008, is a modification to fpaq0pv4.
fpaq0pv4a including fpaq0pv4anc and fpaq0pv4anc0 are bug fixes to fpaq0pv4 by Eugene Shelwien, Apr. 7, 2008, as discussed above.
fpaq0pv4b by Eugene Shelwien, Apr. 18, 2008, replaces the arithmetic coder with sh_v1m port (uses carries), Windows I/O, and other optimizations as discussed here. The Intel-compiled .exe only runs on Intel machines. I tested fpaq0pv4b1 which was patched on May 19, 2008 to run on AMD machines. ghost
The program takes 2 arguments. The first is the number of iterations. The second is the maximum string size to encode.
results for enwik8: command: python ghost-compress.py enwik8 750 6 compressed size: 55,357,196 bytes compression time: 35 hours max memory usage: 65 GB decompression time: 42s max memory usage: 168 MB results for enwik9: command: python ghost-compress.py enwik9 456 5 compressed size: 568,004,779 bytes compression time: 48 hours max memory usage: 88 GB decompression time: 4m 05s max memory usage: 2485 MB
ppp enwik9 > enwik9.ppp (compress) ppp -d enwik9.ppp > enwik9 (decompress)The original code opens both files in text mode, which does not work in Windows. For testing, I modified 3 lines of code to open the input and output files in binary mode as follows:
#include <fcntl.h> // added setmode(fileno(stdout), O_BINARY); // added FILE *f = fopen(*p, "rb"); // changed "r" to "rb"I compiled using gcc 3.4.2 -O3 -fomit-frame-pointer -march=pentiumpro and packed with UPX (linked above, Feb. 11 2008). Times are wall times. I did not use timer 3.01 because its output would be redirected to the output file. Process times are about 50% of wall time based on watching Task Manager.
The program uses a Windows GUI when run with no arguments. It was tested with command line arguments under Wine 1.6 in Ubuntu.
Compressor Opt enwik8 enwik9 prog Total Comp Deco Cmem Dmem Alg Note ------- --- ---------- ----------- ------- ----------- ---- ---- ---- ---- --- ---- ksc 1 79,706,130 3250 2790 40 265 SR 48 2 67,676,824 3730 1480 40 227 SR 48 3 62,570,897 8560 1800 59 273 SR 48 4 59,511,259 32780 6670 62 220 SR 48 4 580,557,413 13,507 x 580,570,920 40050 7917 155 1700 SR 48
lzp2 0.7c was released Oct. 10, 2009. Run times are dominated by disk access, not included below.
Compressor enwik8 enwik9 prog Total Comp Deco Mem Alg Note ------- ---------- ----------- ------- ----------- ---- ---- --- --- ---- lzp2 0.1 74,358,722 655,709,055 5,855 xd 655,714,910 11 9 15 LZP 26 lzp2 0.7c 67,909,076 598,076,882 40,819 x 598,117,701 11 8 15 LZP 26
NTFS disk compression is used in Microsoft Windows when the "compress files to save disk space" checkbox is checked in the folder properties dialog box. Disk compression was introduced in NTFS v1.2 in mid 1995 according to Wikipedia. The compression format is called LZNT1. The algorithm is propretary. However, it was reverse engineered (in Russian, see also here). The algorithm is LZSS (similar to lzrw1). The format consists of groups of 8 symbols each preceded by 8 flag bits packed into a byte. A 0 bit indicates a literal symbol, which is decoded by copying it. A 1 bit indicates a 2 byte offset-length pair which is decoded by going back 'offset' bytes in the output and copying the next 'length'+3 bytes. An offset-length pair uses a variable number of bits allocated to the offset (from 4 to 12) depending on the position in the file, and any remaining bits allocated to the length of the match. A 12 bit offset would correspond to a 4 KB block on disk.
I tested by copying enwik9 between folders with the compression turned on in one folder, and compared with times to copy between two folders both with compression turned off. I tried each copy twice and took the second time, which was at most 1 second faster than the first copy. I used the test machine in note 26 running Windows Vista Home Premium SP1 32 bit with 3 GB memory and a 200 GB disk between folders on the same partition. Copying between two uncompressed folders takes 41 seconds. Copying to a compressed folder takes 51 seconds, or a difference of 10 seconds. Copying from a compressed folder takes 35 seconds. I estimated 9 seconds for decompression by assuming that copying the compressed file directly would take 26 seconds based on its size of 636 MB. (This is probably wrong because the file would be cached in memory uncompressed, but the alternative is a negative time for decompression. Copying either the compressed or uncompressed file to NUL: takes 2 seconds on the second try).
Times were recorded with a watch because timer 3.01 will not time built-in commands
like 'copy'. Task Manager does not show any processes consuming CPU time or memory
during copying. However, memory use should be insignificant (under 16 KB) for
LZSS with 4 KB blocks. Sizes are as reported by right clicking on the compressed
file in Explorer as 'size on disk'. The size of the decompression program is not known.
.6373 shindlet
shindlet
(mirror) is
a series of 3 free command line file compressors by Piotr Tarsa. All are
order-0 arithmetic coders with identical models written in assembler (included).
The three variants are fs (frequency sorting), bt (binary tree), and
sl (linear search). All three produce identical sized compressed files.
In addition, the compressed output of bt and sl are identical.
Results for all 3 variations are below. Comp and Decomp show global
times including disk I/O in ns/byte, with CPU (process) times in parenthesis.
Date is the latest program timestamp in the distribution, not the release date.
Compressor Date enwik8 enwik9 prog Total size Comp Decomp ----------- ------------ ----------- ---------- ------- ----------- --------- --------- shindlet_fs May 7, 2006 62,890,267 637,390,277 1,275 xd 637,391,552 185 (113) 123 (103) shindlet_bt May 27, 2006 62,890,267 637,390,277 1,387 xd 637,391,664 163 (85) 118 (96) shindlet_sl Apr 12, 2006 62,890,267 637,390,277 2,415 xd 637,392,692 166 (94) 121 (102)
compact (man page) is a file compressor by Colin L. Mc Master, Feb. 28, 1979. It was written in K&R C for VAX/PDP11 and SUN under Berkeley UNIX. It uses adaptive order-0 Huffman coding. The (separate) decompression program rebuilds the Huffman tree, so it need not be transmitted.
Neither program takes options. compact deletes the input file and creates an output file with a .C extension. uncompact deletes the compressed file and restores the original. compact was later superceded by compress, which gives better compression.
For this test, compact was compiled using the provided Makefile and tested under
Ubuntu Linux. Minor source code corrections were needed to compile under gcc.
However, the decompresser size is based on the original code. A port to Windows would
be possible but would require more source code changes.
TinyLZP
is a free, open source (GPL v3) file compressor by David Werecat,
Oct. 12, 2012. It uses LZP and takes no options.
The first entry is compiled from source using "cl /O2 tinylzp.c /I."
using Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86
and tested on a 2.0 GHz T3200 under 32 bit Vista.
The second entry, TinyLZP-x86-SSE2.exe, is supplied and requires
MSVCR110.dll (Visual Studio 2012 C++ runtime) to run.
smile (Nov. 5, 2004)
and smile256 (Dec. 5, 2004)
(discussion)
are free, open source file compressors by Andrei Frolov.
These programs are unique for their small executable size.
smile consists of two programs: a 250 byte compressor, smile_e.com
and a 207 byte decompresser, smile_d.com. smile256 is both a compressor
and a decompresser in 256 bytes. This includes code to parse the command
line and open the input and output files.
Source code is in 16 bit assembler for DOS.
Program size is given for the uncompressed .com files because zip
makes them larger.
Both programs use a move-to-front algorithm with the queue position
encoded using an interleaved Elias Gamma code. The position of the
current byte in the queue (1..256) is encoded by dropping the leading 1 bit,
preceding each of the remaining bits with a 0 bit, then terminating with
a 1 bit. After encoding, the byte value is moved to the front of the queue.
smile256 also encodes EOF as 257, resulting in a file that
is usually 1 byte larger than smile_e.
.6942 TinyLZP
Compressor enwik8 enwik9 prog Total size Comp Deco Mem Alg Note
----------- ----------- ---------- ------- ----------- ---- ---- --- --- ----
TinyLZP 0.1 79,220,546 694,274,932 2,811 s 694,277,743 58 46 10 LZP 26
TinyLZP-x86-SSE2 79,220,546 694,274,932 2,811 s 694,277,743 32 38 10 LZP 26
.6955 smile
Compressor enwik8 enwik9 prog Total size Comp Deco Mem Alg Note
----------- ----------- ---------- ------- ----------- ---- ---- --- --- ----
smile_e/smile_d 71,154,788 695,562,502 207 xd 695,562,709 10517 10414 0.6 MTF 26
smile256 71,154,789 256 x 11190 10840 0.6 MTF 26
.7594 barf
barf is a free,
open source file compressor by Matt Mahoney, Sept. 21, 2003. It was written
as a joke to debunk claims of recursive compression. The algorithm is as
follows:
Pass enwik8 enwik9 size (zip) enwik9+prog Comp (wall) Decomp Mem Alg Filename ---- ---------- ----------- ----------- ----------- ---------- ------- --- ---- -------- 1 76,450,126 763,918,762 983,782 s 764,902,544 315 (330) 30 (73) 4 LZ77 enwik9.x 2 76,074,327 758,482,743 983,782 s 759,466,525 439 (462) 23 (60) 4 LZ77 enwik9.x.x 3 76,074,326 758,482,742 983,782 s 759,466,524 488 (551) 18 (44) 4 copy enwik9.x.x.x9v
A similar program, barfest.exe, compresses the million random digits file to
1 byte, rather than the Calgary corpus. The decompresser size is
455,755 bytes (zipped).
hipp v0.5819
is an experimental command line file compressor with source code available by
Bogatov Roman, Aug. 19, 2005. It uses context mixing with ordinary and optionally sparse
(fixed gap) contexts, using a suffix tree with path compression to store statistics.
The options are /m to specify the memory limit in MB (default /m2048),
/o to specify primary context order, i.e. the depth of the suffix tree
with path compression (default /o256), /do to set max
deterministic order (actual order with path decompression) (default /do256, do >= o),
/so to set the number of sparse contexts (default /so0). Sparse contexts
are useful for binary data but generally not text. Memory usage increases
with the size of the file and with /o and /so (but not /do). Also, if the
memory limit is exceeded then an error occurs. Unfortunately enwik9 cannot
be compressed at all because initialization requires more than 800 MB.
Some results for enwik8:
ppmz2 v0.81 is a free,
experimental, open source file compressor by Charles Bloom, May 9, 2004.
It uses PPM. It takes several compression options but only the defaults
were tested. Memory usage grows as the program runs.
On enwik9 it runs out of memory.
Unfortunately, the compressor will not accept truncated XML files such as this benchmark.
It can be made to work by appending the following 38 bytes to enwik8 or enwik9
to create a properly formed XML file (a trailing newline is optional but was not used):
In theory, using no compression (-N) would allow XMill to be used as a preprocessor to other
compressors. However, the decompresser will not accept either enwik8 or enwik9 (with closing
tags appended) if processed with -N (reports "corrupt file").
xmill 0.9.1
(Mar. 15, 2004) also fails to decompress enwik9 and fails to decompress either file with -N.
All programs report "malloc failed" on enwik9. The LZP algorithms
use very little memory themselves, but these implementations allocate
input and output buffers all at once. This fails for enwik9 because of
the 2 GB process limit in Windows.
lzp1 is both a compressor and decompresser. To decompress, use -d as
the third argument. lzp2 is a compressor only. There is a source code
decompresser "lzp2d" but I was unsuccessful in compiling it.
It allows an unexplained option "HuffType" which I did not experiment with.
lzp3o2 has a separate decompresser "lzp3o2d.exe" included in the distribution.
This page is maintained by Matt Mahoney, mattmahoneyfl (at) gmail (dot) com.
.9956 arb2x
arb2x v20060602 is a
free, experimental command line file compressor with source code availalbe
by David A. Scott, updated June 2, 2006.
It is a bitwise bijective order-0 arithmetic coder, best suited
for i.i.d. bits. It takes no arguments
except the input and output filenames. The decompresser is unarb2x.exe.
Failed and Pending Tests
hipp
hipp5819 enwik8 MB Mem Comp (ns/byte)
------- ---------- ------ ----
/o5 22,390,366 248.5 ~3710
/o8 20,555,951 719.5 ~4300
Zipped size: C++ source (commented in Russian) = 98,765, exe = 36,724.
ppmz2
XMill
XMill 0.8 is an open source
command line XML preprocessor/compressor by AT&T, written by Dan Suciu,
Hartmut Liefke, and Hedzer Westra in March, 2003.
It works by sorting by XML tags to bring similar content together, then
compressing with gzip, bzip2, or ppmd. Optionally it can (in theory) output the
preprocessed data as input to another compressor.
"</text></revision></page></mediawiki>
However, decompression succeeds for enwik8 but fails for enwik9. (Failed
values in parenthesis, timed for enwik8). The decompresser (xdemill) reports "corrupt file".
Compression Compressed size Decompresser Total size Time (ns/byte)
Program Options enwik8 enwik9 size (zip) enwik9+prog Comp Decomp Mem
------- ------- ---------- ----------- ----------- ----------- ----- ----- ---
xcmill 0.8 -w -P -9 -m800 26,579,004 (230,934,622) 114,764 xd (231,049,386) 616 (530) 800
xcmill 0.9.1 -w -P -9 -m1700 26,579,004 (230,914,289) 108,845 xd (231,023,134) 711 984
The -w option preserves whitespace. Otherwise compression is lossy. -P selects ppmdi compression
(bzip2, gzip and no compression are also available). -9 selects maximum compression. -m800 allows
800 MB of memory.
lzp3o2
lzp3o2 (LZP 3 with order 2 literal
coding) is one of a family of open source file compressors by
Charles Bloom, originally written in 1995. The algorithm is described in
a paper submitted to DCC'96.
lzp3o2 uses LZP compression with order 2 modeling of literals and arithmetic
coding. The tested version of the source code
is dated Aug. 25, 1996 and compiled for Windows Oct. 10, 1998. The compiled
distribution from here was tested.
Program enwik8 Comp Deco Mem Alg
------- ---------- ---- ---- --- ---
lzp1 56,013,656 23 20 153 LZP
lzp2 40,350,594 80 280 LZP
lzp3o2 33,041,439 230 270 151 LZP
History
May 10 2006 - benchmark began with 1 month of testing about 2 compressors per day.
Jun 10 2006 - began test data analysis.
Jun 14 2006 - updated xml-wrt 2.0 14.06.06 | ppmonstr.
Jun 17 2006 - reorganized website from 1 big page to 4 smaller pages.
Jun 19 2006 - added xml-wrt 2.0 19.06.06 (standalone LZMA mode).
Jun 20 2006 - added ocamyd 1.65 LTCB 1.0.
Jun 21 2006 - updated TC 5.0 to dev 4 (compression unchanged but faster).
Jul 19 2006 - updated TC 5.0 to dev 9, added dark 0.32b.
Jul 20 2006 - added arbc2z.
Jul 21 2006 - added TarsaLZP (July 4 2006).
Jul 22 2006 - added uda 0.300.
Jul 23 2006 - verified uda 0.300 decompression.
Jul 24 2006 - updated TC 5.0 to dev 11.
Jul 29 2006 - added CTW 0.1.
Aug 01 2006 - updated TarsaLZP (July 30 2006), added ppmvc v1.1.
Aug 06 2006 - added the Hutter Prize, renamed Large Text Compression Benchmark to Human Knowledge Compression Contest,
added rules for the Hutter Prize, and updated rationale to add a section on AIXI.
Aug 07 2006 - added link to paq8f, updated prize formula (Z might not decrease), and that prize committee members
are not elibible for prize money. Added logo. Minor edit to rationale.
Aug 08 2006 - the prize fund (Z) does not decrease.
Aug 11 2006 - added a lexcial and string repetition analysis to the data study.
Aug 13 2006 - typo in Rationale.
Aug 14 2006 - updated dark v0.40. Edited Rationale (AIXI, compression does not seem like AI, lossy compression).
Aug 16 2006 - raq8g and durilca 0.5(Hutter) submitted for Hutter prize, neither verified yet.
Aug 17 2006 - verified durilca 0.5(Hutter) claim. Posted raq8g.exe for Windows.
Aug 18 2006 - verified raq8h -7 on enwik8 under Windows. Tested paq8f -8 on enwik8 (not verified).
Reported raq8h -8 result (Linux).
Aug 19 2006 - updated ha, added Info-ZIP, ESP. Clarified rules 5 and 6.
Aug 20 2006 - Removed rules and results for the Hutter prize. These may be found on the Hutter Prize website.
Updated ha and Info-ZIP.
Aug 22 2006 - added paq8hp1. Updated Info-ZIP. Added submission times and unzipped .exe sizes for Hutter prize candidates.
Aug 23 2006 - updated paq8hp1 for enwik9 -8 (compress only). Tuned xml-wrt|ppmonstr for enwik8 at 2 GB. Added durilca4linux.
Aug 26 2006 - updated dark 0.46. Fixed link to durilca4linux. Posted enwik8.bz2 and enwik9.bz2 on the data page.
Aug 28 2006 - added paq8hp2 (enwik8, 1 GB, not checked). Updated ppmonstr, xmlwrt|ppmonstr, slim, and ash for 2 GB memory.
Aug 29 2006 - verified paq8hp2 for enwik8 (1 GB and 2 GB).
Aug 31 2006 - added bbb.
Sep 01 2006 - updated bbb, TarsaLZP, paq8hp2 (as a preprocessor).
Sep 02 2006 - corrected error in lexical analysis table on data page (found by Szymon Grabowski).
Sep 03 2006 - added paq8hp3 -7 for enwik8 (Hutter prize candidate, verified).
Sep 05 2006 - updated paq8hp3 (enwik9 -8, not verified).
Sep 10 2006 - updated paq8hp4 (verified for enwik8), fixed links to PX and pimple.
Sep 11 2006 - updated paq8hp4 for enwik9 (compression only), added paq1 and expanded PAQ series documentation.
Sep 12 2006 - minor edits in paq8hp1, raq8g descriptions.
Sep 13 2006 - updated paq8hp2 for enwik9.
Sep 14 2006 - updated xml-wrt 3.0.
Sep 15 2006 - updated xml-wrt 3.0|ppmonstr.
Sep 20 2006 - updated paq8hp5 -7 enwik8. Verified paq8hp4 -8 enwik9.
Sep 21 2006 - updated paq8hp5 -8 enwik8.
Sep 23 2006 - updated paq8hp5 -8 enwik9 (not verified).
Sep 24 2006 - added QuickLZ.
Sep 29 2006 - added fpaq0x, fpaq0s2.
Sep 30 2006 - clarified submission dates for paq8hp2 through paq8hp5. Posted paq8hp2 source code.
Oct 01 2006 - updated fpaq0x1a, fpaq0s2b, tc 5.1 dev 1.
Oct 02 2006 - updated tc 5.1 dev 2.
Oct 06 2006 - posted paq8hp3 source code (now top ranked). Added fpaq0x1b.
Oct 08 2006 - added fpaq0s3.
Oct 10 2006 - posted paq8hp4 source code (now top ranked).
Oct 12 2006 - added fpaq0s4.
Oct 13 2006 - added tc 5.1 dev 5.
Oct 15 2006 - verified paq8hp5 -8 enwik9 decompression. Added fpaq0s5.
Oct 16 2006 - added durilca4linux_2 (now top ranked, not yet verified for enwik9).
Oct 18 2006 - updated duricla4linux_2 (-t2(11) option).
Oct 21 2006 - added fpaq2.
Oct 22 2006 - updated QuickLZ 0.9.
Oct 27 2006 - posted paq8hp5 source code (now ranked #2).
Oct 30 2006 - updated fpaq0s6.
Nov 03 2006 - mirrored enwik8.bz2 and enwik9.bz2 to mattmahoney.net/text
Nov 05 2006 - updated paq8hp6. Linked to FV results on data page.
Nov 06 2006 - verified paq8hp6 -7 enwik9 decompression.
Nov 07 2006 - updated fastari.
Nov 10 2006 - added PeaZip.
Nov 15 2006 - added paq8j.
Nov 17 2006 - added paq8ja.
Nov 20 2006 - added fpaq3.
Nov 22 2006 - added paq8jb.
Nov 29 2006 - added paq8jc.
Dec 02 2006 - added fpaq3b.
Dec 08 2006 - added paqh8p7a (enwik8 only), posted paq8hp6 source.
Dec 10 2006 - updated paq8hp7a for enwik9 (not verified).
Dec 12 2006 - added paq8hp7.
Dec 13 2006 - updated paq8hp6 -8 enwik9.
Dec 17 2006 - posted enwik8.pmd and enwik9.pmd (PPMD var. J format).
Dec 21 2006 - added fpaq3c.
Dec 24 2006 - added quad v1.01a, tc 5.1 dev 7.
Dec 28 2006 - added fpaq3d.
Jan 01 2007 - added paq8jd (enwik8 -7).
Jan 02 2007 - updated paq8jd -8 enwik8 (not verified).
Jan 08 2007 - added hook v0.2.
Jan 11 2007 - added hook v0.3.
Jan 12 2007 - added hook v0.3a.
Jan 13 2007 - added tc 5.1dev7x. Fixed hook.zip archive.
Jan 15 2007 - posted paq8hp7 source code. Added hook v0.4.
Jan 17 2007 - completed dmc and Info-Zip 2.3.1.
Jan 19 2007 - added paq8hp8.
Jan 22 2007 - added hook v0.5b.
Jan 27 2007 - added chile 0.4.
Feb 03 2007 - added ocamyd-1.66.final (merged with ocamyd LTCB)
Feb 07 2007 - added hook v0.6.
Feb 08 2007 - added hook v0.6b, quad v1.04a, tc 5.2 dev 2.
Feb 09 2007 - corrected error in tc 5.2 dev 2.
Feb 12 2007 - added ccm_extra 1.03a.
Feb 14 2007 - added hook v0.6c.
Feb 15 2007 - added paq8k -8 enwik8 (not verified).
Feb 20 2007 - added paq8hp9 -7 enwik8 (verified).
Feb 22 2007 - updated paq8hp9 -7 enwik9.
Feb 23 2007 - added link to paq8hp9any (revised paq8hp9, not tested), added quad 1.07b, ccm 1.1.1a.
Mar 02 2007 - added ccm 1.1.2a.
Mar 06 2007 - added LZPXj 1.2h.
Mar 10 2007 - added paq8l enwik8.
Mar 11 2007 - added hook v0.7.
Mar 13 2007 - added hook v0.7b.
Mar 14 2007 - added quad 1.08.
Mar 17 2007 - added hook v0.8.
Mar 18 2007 - added hook v0.8b.
Mar 19 2007 - added hook v0.8c.
Mar 21 2007 - added hook v0.8d, FreeArc 0.36.
Mar 24 2007 - added quad 1.10.
Mar 27 2007 - added paq8hp10 -7 enwik8, posted paq8hp9 source code, added hook v0.8e, M99.
Mar 28 2007 - corrected M99 enwik8 result, updated FreeArc description, removed unsupported quad versions from main table.
Mar 31 2007 - added paq8hp10any -8 enwik8.
Apr 01 2007 - added dark 0.51, opendark.
Apr 02 2007 - updated paq8hp10any -8 enwik9 (decompression not verified), added DGCA 1.10.
Apr 05 2007 - added quad 1.11, quad 1.11HASH2, ccm 1.20a, updated FreeArc description.
Apr 06 2007 - added hook v0.9.
Apr 08 2007 - added freehook 0.2, ccm 1.20d.
Apr 09 2007 - added xmill 0.9.1 (fails), barf, quad 1.12.
Apr 10 2007 - added hook 0.9b, freehook 0.3.
Apr 19 2007 - added M99 v2.1, QuickLZ 1.20 and 1.30beta, lzpm 0.02, tornado 0.1.
Apr 22 2007 - added thor 0.94a.
Apr 23 2007 - added ccm (ccmx) 1.21.
Apr 27 2007 - added slug 1.1b.
Apr 30 2007 - added paq8hp11 -7 enwik8. Posted paq8hp10any source code.
May 03 2007 - added paq8hp11any -8 enwik8, fpaq0p.
May 05 2007 - added lzpm 0.03 and 0.04. Fixed misleading description of DMC algorithm in hook.
May 08 2007 - added lzc 0.01, hook0.9c.
May 09 2007 - added pucrunch, TarsaLZP May 6 2007, thor 0.95, srank 1.1.
May 10 2007 - added paq8hp11any -8 enwik9 (decompression not verified).
May 11 2007 - added lzc 0.03, updated table description (time, memory, algorithms).
May 14 2007 - added paq8hp12 -7 enwik8.
May 16 2007 - added uc2, lzc 0.04.
May 18 2007 - added BriefLZ 1.05.
May 20 2007 - added paq8hp12any -8 enwik8/9 (decompression not verified), lzpm 0.06. Updated times in main table to process times.
May 21 2007 - added paq8hp12any -7/-8 enwik8 (decompression verified), 7zip 4.46a.
May 26 2007 - added lzc 0.05b.
May 29 2007 - added fpaq02.
Jun 01 2007 - added turtle 0.01.
Jun 02 2007 - added turtle 0.02.
Jun 05 2007 - added turtle 0.03.
Jun 08 2007 - added turtle 0.04.
Jun 12 2007 - posted paq8hp11any source code, added turtle 0.05.
Jun 16 2007 - added TarsaLZP ver. Jun 17 2007, FastLZ ver. Jun 12 2007, pim 2.01.
Jun 23 2007 - added turtle 0.07.
Jul 24 2007 - added lpaq1, pim 2.04b, TarsaLZP Jul 18 2007, posted paq8hp12any source code.
Jul 30 2007 - added TarsaLZP Jul 30 2007. Updated rules to allow 1800 MB memory.
Jul 31 2007 - added pim 2.10.
Aug 03 2007 - added sr2.
Aug 07 2007 - added lzpm 0.07. Underlined times and memory to indicate records.
Aug 08 2007 - added pimple2.
Aug 09 2007 - added lzpm 0.08, TarsaLZP Aug 8 2007.
Aug 11 2007 - added TarsaLZP Aug 10 2007.
Aug 13 2007 - added gziphack, retested gzip 1.3.5, Info-ZIP 2.32 Win32.
Aug 14 2007 - added QuickLZ 1.30, compact.
Aug 15 2007 - added lzturbo 0.01, WinTurtle 1.2.
Aug 16 2007 - added paq8fthis2 -8 enwik8, WinTurtle 1.21, lzpm 0.09.
Aug 23 2007 - added paq8n -8 enwik8, paq8osse -8 enwik8, thor 0.96a, lzpm 0.10.
Aug 24 2007 - added paq8o -8 enwik8.
Aug 29 2007 - added lzc 0.06b.
Aug 30 2007 - added HKCC-2 enwik8 decompresser, added link to paq8o ver. 2, added WinTurtle 1.30, qazar 0.0pre5.
Aug 31 2007 - added qc 0.050.
Sep 02 2007 - added HKCC-2 Sep 01 2007 version, WinRK 3.03 SFX.
Sep 06 2007 - added lzpm 0.11.
Sep 13 2007 - added lzpmlite 0.11.
Sep 14 2007 - added paq8o3 -8 enwik8.
Sep 20 2007 - added lpaq2, hook 1.0.
Sep 22 2007 - added paq8o4 v1, rings 0.1.
Sep 29 2007 - added paq8o6 -8 enwik8.
Sep 30 2007 - added lpaq3, elpaq3, lprepaq 1.2.
Oct 01 2007 - added lpaq3a, lpaq3e.
Oct 04 2007 - added lpaq4, lpaq4e.
Oct 05 2007 - added lzturbo 0.1.
Oct 16 2007 - added lpaq5, lpaq5e, withdrew HKCC-2.
Oct 20 2007 - added paq8o7 -8 enwik8.
Oct 23 2007 - added lpaq6, lpaq6e.
Oct 24 2007 - added paq8o8 -8 enwik8.
Oct 25 2007 - added lzc 0.07.
Oct 28 2007 - added rule that benchmark results will be delayed 30 days after the latest version of the program is published.
Nov 09 2007 - added lpaq7, lpaq7e*, xwrt 3.2*, sr3*.
Nov 22 2007 - added quickLZ 1.40, rings 0.2, hook 1.1, lzc 0.08*.
Nov 23 2007 - added lzpm 0.12.
Dec 03 2007 - ranked lpaq7e, xwrt 3.2, sr3, lzc 0.08.
Dec 04 2007 - added and ranked xwrt 3.2|ppmonstr J.
Dec 05 2007 - added symbra 0.2*.
Dec 11 2007 - added lpaq8*, lpaq8e*.
Dec 13 2007 - added lcssr 0.2*.
Dec 16 2007 - uploaded symbra 0.2, lcssr 0.2 mirrors, added fpaqa*, hook 1.3, lzpm 1.3, cmm1, cmm2.
Dec 17 2007 - corrected cmm1, cmm2, ranked cmm1.
Dec 18 2007 - added fpaqb*.
Dec 20 2007 - updated fpaqb v2*, added fpaq0m, bit 0.1*.
Dec 21 2007 - added lpaq1a.
Dec 24 2007 - added fpaqc*.
Dec 25 2007 - added lpq1, rings 0.3*.
Dec 26 2007 - added FreeArc 0.40-pre-4*.
Jan 09 2008 - added fpaq0r, fpaq0rs*, ranked lpaq8e, lcssr 0.2.
Jan 11 2008 - added flashzip 0.01, flashzip 0.02*, winturtle 1.60*, ccmx 1.30*.
Jan 13 2008 - added lzpm 0.14, cmm 080113*. Updated pkzip 2.04 -ex.
Jan 17 2008 - added lzpm 0.15.
Jan 25 2008 - added fpaq0pv2, ranked FreeArc 0.40-pre-4, bit 0.1, rings 0.3, fpaq0mw.
Jan 28 2008 - added fpaq0f*.
Jan 30 2008 - added fpaq0f2*.
Jan 31 2008 - added lzw 0.1, paq9a. Repealed 30 day wait rule and ranked pending compressors marked with *.
Feb 04 2008 - added flashzip 0.3.
Feb 08 2008 - added lzw 0.2, rings 1.0.
Feb 09 2008 - added cmm3 080207.
Feb 11 2008 - added ppp.
Feb 12 2008 - added lzp3o2, updated ppp description.
Feb 13 2008 - added rings 1.1, lzrw1.
Feb 14 2008 - added lzrw1-a, lzrw2, lzrw3, lzrw3-a, lzrw5, updated lzrw1.
Feb 17 2008 - updated lzrw1-a, lzrw2, lzrw3, lzrw3-a, lzrw5 (new .exe sizes).
Feb 21 2008 - added durilca4linux_3.
Feb 22 2008 - added drt|lpaq9e.
Feb 25 2008 - added lzturbo 0.9.
Mar 04 2008 - added rings 1.2.
Mar 09 2008 - added balz 1.02, rzm 0.06c, tornado 0.3.
Mar 13 2008 - added Stuffit 12.0.0.17.
Mar 14 2008 - added cmm4 v0.0.
Apr 02 2008 - added rings 1.3.
Apr 04 2008 - added fpaq0pv3.
Apr 06 2008 - added fpaq0pv5.
Apr 14 2008 - added rings 1.4c.
Apr 15 2008 - updated rings 1.4c description.
Apr 21 2008 - added rings 1.5.
Apr 22 2008 - added durilca4linux_3 v2 (new dictionary).
Apr 28 2008 - added lpaq9f.
May 09 2008 - added balz 1.06.
May 11 2008 - added packet 0.01, slug 1.27, rzm 0.07h.
May 14 2008 - added balz 1.07.
May 18 2008 - added packet 0.02.
May 19 2008 - added fpaq0pv4, fpaq0pv4nc, fpaq0pv4nc0, fpaq0pv4a, fpaq0pv4anc, fpaq0pv4and0.
May 20 2008 - added packet 0.03b, balz 1.08, fpaq0pv4b1.
May 21 2008 - added balz 1.09.
May 22 2008 - added durilca4linux3 v3, cmm4 v0.1e.
May 23 2008 - updated cmm4 v0.1e description, lpaq9g, fcm1.
Jun 03 2008 - added balz 1.12.
Jun 04 2008 - added lpaq9h.
Jun 10 2008 - added paq8o8-intel -1, paq8o8z-jun7 -1.
Jun 12 2008 - added paq8o10t (enwik8 only), balz 1.13.
Jun 13 2008 - added lpaq9i.
Jun 14 2008 - added drt|ppmonstr (under lpaq9i).
Jun 17 2008 - updated paq8o8z (note 25), durilca4linux_3 v3 (2 GB).
Jun 18 2008 - added flzp v1.
Jun 19 2008 - added packet 0.90b.
Jul 17 2008 - added lzgt, lzgt1, lzgt2, lzgt3.
Jul 19 2008 - added nanozip 0.01a, balz 1.15.
Jul 20 2008 - updated nanozip 0.01a -txt, clarified method of creating zip archive of decompresser.
Jul 22 2008 - added pim 2.50, tornado 0.4a, M99 v2.2.1.
Jul 24 2008 - added 4x4 0.2a, bit 0.2b.
Jul 25 2008 - added nanozipltcb.
Jul 26 2008 - added flashzip 0.9.
Jul 28 2008 - corrected Pareto frontier.
Aug 02 2008 - added nanozip 0.03a, lzss 0.01.
Aug 18 2008 - added flashzip 0.91, lpaq9j.
Sep 05 2008 - added size vs. speed and memory graphs.
Sep 26 2008 - added bzp 0.2, ppms J.
Oct 02 2008 - added lpaq9k.
Oct 27 2008 - added nanozip 0.05a.
Oct 28 2008 - added lzgt3a.
Nov 21 2008 - added bit 0.7. Updated test computer (note 26).
Nov 27 2008 - added ppmx 0.01, sr3c.
Nov 28 2008 - added mcomp 2.00.
Dec 02 2008 - added lpaq9l, ppmx 0.02.
Dec 22 2008 - added ppmx 0.03.
Dec 29 2008 - added M1 0.2a.
Jan 02 2009 - added M1 0.3.
Jan 05 2009 - added ppmx 0.04.
Jan 09 2009 - updated link to paq8hp12any.
Jan 28 2009 - added xdelta 3.0u.
Feb 09 2009 - added bcm 0.03.
Feb 11 2009 - added bcm 0.04.
Feb 21 2009 - added drt|lpaq9m.
Mar 02 2009 - added Stuffit 2009 13.0.0.19, nanozip 0.06a, NTFS (LZNT1).
Mar 05 2009 - added bcm 0.05.
Mar 06 2009 - updated bcm 0.05.
Mar 10 2009 - added flashzip 0.93a, fixed links to winturtle, flashzip, rings, hook, packet, bzp.
Mar 12 2009 - added bwmonstr 0.00.
Mar 15 2009 - added bcm 0.07.
Mar 20 2009 - added bwmonstr 0.01.
Mar 26 2009 - added flashzip 0.94, decomp8.
Apr 01 2009 - added runcoder1.
Apr 13 2009 - added lzturbo 0.94, M1 0.3b.
Apr 14 2009 - added lzuf.
Apr 16 2009 - added M1 0.3b parameter e8-m103b1-mh.
Apr 17 2009 - added lzp2.
Apr 18 2009 - added csc2.
Apr 21 2009 - added paq8p3, paq8p3 v2.
Apr 22 2009 - added decomp8b.
Apr 22 2009 - added lzbw1 0.8.
Apr 29 2009 - added hook 1.4.
May 08 2009 - updated opendark-A.
May 26 2009 - added decmprs8.
Jun 01 2009 - added bcm 0.08.
Jun 02 2009 - added reorder_v2|bcm 0.08.
Jun 05 2009 - updated reorder_v2|bcm 0.08 xlt.
Jul 14 2009 - added bwmonstr 0.02
Jul 16 2009 - updated bwmonstr 0.02 comments.
Jul 21 2009 - added durilca'kingsize
Jul 23 2009 - moved website to http://mattmahoney.net/dc/
added paq8px_v60_turbo, split paq from paq8hp entries, moved decompr8 series to lpaq,
added flashzip 0.99, updated sr3.exe to remove antivirus false alarms due to upack.
Aug 07 2009 - added packet 0.91b.
Aug 14 2009 - added csc3 v.2009.8.12, combined with csc2.
Aug 16 2009 - added and corrected rings 1.6.
Aug 26 2009 - added flashzip 0.99b4.
Sep 14 2009 - added zpaq 1.03.
Sep 15 2009 - updated zpaq 1.03 cmax3.cfg.
Sep 16 2009 - updated zpaq 1.03 cmax4.cfg, updated paq8hp12 links,
Sep 17 2009 - added rule that each compressor can only be listed once,
so removed xwrt|ppmonstr. Updated zpaq 1.03 with drt|cmax4.cfg (not in main table),
updated zpaq 1.03 cmax_enwik9.
Sep 18 2009 - updated zpaq 1.03 o0.cfg, o1.cfg, o2.cfg, drt|max_enwik9drt.cfg.
Sep 23 2009 - added csc31.
Oct 01 2009 - added zpipe 1.00 (zpaq).
Oct 07 2009 - added zpaq cbwt_j2.cfg,18.
Oct 11 2009 - added M03 v0.2a, lzp2 0.7c.
Oct 13 2009 - added bcm 0.09.
Oct 15 2009 - added zpaq v1.08 cbwt_slowmode1_1GB_block.cfg.
Oct 15 2009 - added lz4 0.2.
Oct 26 2009 - added zpaq v1.09 ocbwt_j1.cfg and corrected memory usage.
Oct 29 2009 - corrections to Pareto frontier.
Nov 12 2009 - added durilca'kingsize_4 (new dictionary).
Nov 27 2009 - added lrzip 0.40.
Nov 29 2009 - added tests for durilca'kingsize.
Nov 30 2009 - added tests for durilca'kingsize_4, added lrzip 0.42.
Dec 07 2009 - added 7zip 9.04a.
Dec 15 2009 - added zhuff 0.1, bcm 0.10.
Dec 17 2009 - added M1x2 v0.5-1.
Dec 29 2009 - updated bcm 0.10.
Jan 15 2010 - added bpe2 v1, bpe2 v2.
Jan 17 2010 - updated shindlet link.
Jan 19 2010 - added kwc.
Jan 21 2010 - added acb 2.00c.
Feb 01 2010 - added ulz 0.01.
Feb 06 2010 - added ulz 0.02.
Feb 08 2010 - added m1x2 0.6.
Feb 12 2010 - added bpe, bpe2v3.
Feb 14 2010 - updated bpe2v3 description.
Feb 16 2010 - updated srank link.
Feb 19 2010 - added ppmx 0.05.
Feb 24 2010 - added szip 1.12a, fixed typos.
Mar 01 2010 - added flashzip 0.99b8.
Mar 03 2010 - added nanozipltcb 0.08.
Mar 30 2010 - added etincelle alpha 3.
Apr 07 2010 - added bsc 1.0.0.
Apr 08 2010 - updated bsc 1.0.0.
Apr 11 2010 - added bsc 1.0.3.
Apr 23 2010 - corrections to ppmvc, ctxf.
May 03 2010 - added yzx 0.01, bsc 2.00, fp8_v1, plzip.
May 10 2010 - added csc32 a2, yzx 0.02, nanozipltcb 0.09.
May 21 2010 - added yzx 0.03.
May 27 2010 - added yzx 0.04.
Jun 06 2010 - added nanozip 0.08a.
Jun 09 2010 - updated lpaq9m.
Jun 11 2010 - updated nanozip 0.08a, cmm4 0.2b, 7ip 9.12b (note 42).
Jun 15 2010 - added bsc 2.20.
Jun 21 2010 - updated winrk 3.03, ppmonstr J.
Jun 22 2010 - added bcm 0.11.
Jun 26 2010 - updated bcm 0.11, drt (lpaq9m).
Jun 28 2010 - updated paq8hp12any (note 41), bcm link.
Jul 16 2010 - added zp 1.00.
Jul 28 2010 - added ppmx 0.06, bsc 2.26. Updated links to pimple2, ocamyd.
Aug 05 2010 - updated zp 1.00 (zpaq).
Aug 26 2010 - added lzham alpha 2.
Aug 30 2010 - added lzham alpha 3.
Sep 01 2010 - updated lzham alpha 3.
Sep 26 2010 - added irolz.
Oct 15 2010 - added st 0.51.
Nov 02 2010 - added bcm 0.12.
Dec 16 2010 - added bwtsdc v1.
Jan 06 2011 - added bsc 2.4.5.
Jan 23 2011 - added pzpaq 0.01.
Jan 24 2011 - updated pzpaq 0.01.
Jan 25 2011 - added lz4 0.6, lz4hc 0.9.
Jan 31 2011 - added xz 5.0.1.
Feb 19 2011 - added stz 0.7.2.
Feb 23 2011 - added ppmx 0.07.
Mar 02 2011 - added BWTmix v1.
Mar 04 2011 - added stz 0.8.
Mar 22 2011 - added csc32 final, zhuff 0.7.
Mar 23 2011 - added bsc 2.5.0.
Apr 27 2011 - added snappy 1.0.1.
May 17 2011 - added crush 0.01.
May 20 2011 - added zp 1.02.
May 28 2011 - updated bwtsdc description.
Jun 01 2011 - added flashzip 0.99c1. updated bcm 0.12.
Aug 29 2011 - added bsc 3.0.0.
Aug 30 2011 - corrections to bsc 3.0.0 description.
Sep 01 2011 - added enwik8.zip and enwik9.zip to textdata.html.
Sep 27 2011 - added comprox_ba 20110927, comprox_sa 20110927.
Sep 28 2011 - added dzo beta, comprox_ba 20110928, comprox_sa 20110928.
Sep 29 2011 - added comprox_ba 20110929, comprox_sa 20110929.
Sep 30 2011 - added KuaiZip 2.3.2 x86, 7zip 9.20, Info-ZIP 3.00.
Oct 02 2011 - added lzsr 0.01.
Oct 10 2011 - added comprox 0.1.1, flashzip 0.99c3.
Oct 12 2011 - added lz4 v1.2.
Oct 20 2011 - added xpv5.
Oct 31 2011 - added flashzip 0.99d1.
Nov 02 2011 - added M03 v1.1b.
Nov 05 2011 - added nanozip 0.09a. Added link to enwik8 ranking on compressionratings.com.
Nov 13 2011 - added zpaq v4.00, merged with zp.
Nov 24 2011 - added RangeCoderC v1.2.
Nov 26 2011 - added RangeCoderC v1.3.
Nov 29 2011 - added zhuff v0.8, RangeCoderC v1.4 and v1.5, link to dark.
Dec 05 2011 - added RangeCoderC v1.6, v1.7a.
Dec 09 2011 - added RangeCoderC v1.7.
Dec 13 2011 - added RangeCoderC v1.8.
Dec 17 2011 - added zcm v0.01.
Dec 23 2011 - added zcm v0.02.
Dec 31 2011 - added ppmx v0.08.
Jan 01 2012 - updated ppmx v0.08.
Jan 04 2012 - added yzx 0.11, zcm 0.03.
Jan 17 2012 - added pigz 2.2.3, updated gzip 1.3.5.
Jan 24 2012 - added MTCompressor 1.0.
Jan 26 2012 - added paq8pxd.
Jan 29 2012 - added TarsaLZP 29 Jan 2012.
Jan 30 2012 - added zcm v0.04.
Feb 11 2012 - added paq8pxd_v2.
Feb 17 2012 - added paq8px_v69.
Feb 19 2012 - added zcm 0.11.
Mar 01 2012 - added fbc v1.0.
Mar 02 2012 - added fbc v1.1. Converted decmprs8, decomp8, decomp8b, all_HKCC, lpaq9* to .zpaq
Mar 05 2012 - added crook v0.1.
Mar 18 2012 - added lrzip 0.612.
Mar 22 2012 - corrected lrzip options.
Mar 23 2012 - added data-shrinker 23Mar2012.
Apr 04 2012 - added zcm 0.20b.
Apr 11 2012 - added fp8 v2, FreeArc 0.666.
Apr 19 2012 - added paq8pxd_v3.
Apr 23 2012 - added paq8pxd_v4.
May 02 2012 - added zcm 0.30.
May 15 2012 - added fp8 v3.
May 16 2012 - added zcm 0.40.
May 17 2012 - updated zcm 0.40.
Jun 02 2012 - added zcm 0.50a.
Jun 12 2012 - changed spelling "Ratushnyak" to "Rhatushnyak" due to name change.
Jun 17 2012 - added urban.
Jul 10 2012 - added bsc 3.1.0.
Aug 05 2012 - added diz.
Aug 24 2012 - added comprox 0.6.0.
Sep 01 2012 - added st 0.81.
Sep 10 2012 - added comprox 0.7.0.
Sep 11 2012 - updated comprox 0.7.0, added zcm 0.60d.
Sep 26 2012 - added comprox 0.8.0.
Sep 27 2012 - added comprox 0.8.0-bugfix1.
Oct 05 2012 - added flashzip 1.0.0.
Oct 07 2012 - added comprolz 0.1.0.
Oct 10 2012 - added lazy 1.00.
Oct 12 2012 - added TinyLZP 0.1, TinyCM 0.1.
Oct 14 2012 - updated TinyLZP 0.1, added zcm 0.70b.
Oct 18 2012 - added comprox 0.9.0, comprolz 0.2.0.
Oct 21 2012 - added smile.
Oct 22 2012 - updated smile.
Oct 23 2012 - added zpaq 6.12.
Oct 30 2012 - added exdupe 0.3.3 beta.
Nov 19 2012 - added TarsaLZP 18.nov.2012.
Nov 20 2012 - updated link to dmc.
Nov 26 2012 - added comprox 0.10.0, comprolz 0.10.0.
Dec 12 2012 - added flashzip 1.1.2.
Dec 17 2012 - added comprox 0.11.0, comprolz 0.11.0.
Dec 18 2012 - added comprox 0.11.0-bugfix1, comprolz 0.11.0-bugfix1.
Jan 15 2013 - added lzwc 0.1, lzwc 0.3, lzwc_bitwise 0.7, lzip 1.14-rc3.
Jan 17 2013 - added plzma_v3p, plzma_v3c.
Jan 18 2013 - updated plzma_v3b (not v3p), plzma_v3c.
Jan 23 2013 - added smac 1.8.
Jan 24 2013 - added zpaq 6.19.
Jan 31 2013 - added smac 1.9.
Feb 01 2013 - added WinRAR 4.20.
Feb 07 2013 - added smac 1.10.
Feb 24 2013 - added smac 1.11.
Mar 11 2013 - added smac 1.12a.
Mar 15 2013 - added pigz 2.3.
Mar 25 2013 - added smac 1.13.
Apr 15 2013 - updated bwmonstr description.
Apr 20 2013 - added smac 1.14.
Apr 21 2013 - added paq8pxd_v5.
Apr 30 2013 - added WinRAR 5.00b2.
May 01 2013 - updated WinRAR 5.00b2.
May 14 2013 - added lzturbo 1.1.
May 15 2013 - updated lzturbo 1.1.
May 16 2013 - added zcm 0.80.
May 21 2013 - added smac 1.15.
Jun 04 2013 - added mcm 0.0.
Jun 13 2013 - added mcm 0.2.
Jun 18 2013 - added tangelo 1.0 (fp8).
Jun 22 2013 - added bcm 0.14, zcm 0.88.
Jun 26 2013 - added zpaq 6.34.
Jun 27 2013 - updated crush 0.01, added mcm 0.3.
Jun 28 2013 - updated crush 0.01 description.
Jun 30 2013 - updated bsc 3.10 description.
Jul 01 2013 - added crush 1.00.
Jul 02 2013 - updated crush 1.00.
Jul 06 2013 - added tangelo 2.0 (fp8).
Jul 08 2013 - updated tangelo 2.0.
Jul 11 2013 - added rings 2.0.
Jul 14 2013 - added bwtdisk 0.9.0.
Jul 15 2013 - added crushm.
Jul 17 2013 - added mcm 0.4.
Jul 20 2013 - added tangelo 2.1.
Jul 24 2013 - added tangelo 2.3.
Jul 31 2013 - added smac 1.16, sharc 0.9.5b.
Aug 01 2013 - updated sharc 0.9.6.
Aug 20 2013 - added packet 1.0, paq8pxd_v7, zlite.
Aug 28 2013 - added ppmz2 0.81.
Oct 14 2013 - added zpaq 6.42, zpaqd 6.32 max5.cfg.
Oct 16 2013 - added arj 3.10, zpaq 6.42 max6.cfg.
Oct 28 2013 - added lzf 1.00.
Oct 30 2013 - added lzf 1.01.
Nov 01 2013 - added zling.
Nov 04 2013 - added smac 1.17.
Nov 19 2013 - added smac 1.17a.
Dec 10 2013 - added smac 1.18, packet 1.1, packARC 0.7RC11, mtari 0.2.
Dec 11 2013 - added cm0_ext (includes cm0, cm1, bwcm).
Dec 12 2013 - added sharc 0.9.10.
Dec 13 2013 - added sharc 0.9.11b.
Dec 14 2013 - updated sharc 0.9.11b description.
Dec 19 2013 - added smac 1.19.
Dec 26 2013 - added zling Dec-25-2013.
Jan 02 2014 - added lzv 0.1.0.
Jan 08 2014 - added doboz 0.1.
Jan 17 2014 - added smac 1.20.
Jan 21 2014 - added zling Jan-21-2013.
Jan 23 2014 - added cm4_ext.
Feb 04 2014 - added zhuff 0.95b, 0.97 beta, alba 0.1.
Feb 05 2014 - updated alba 0.1.
Feb 06 2014 - added alba 0.2.
Feb 10 2014 - added lzss 0.2.
Feb 11 2014 - updated lzss 0.2.
Feb 17 2014 - added RH, RH2.
Feb 18 2014 - added alba 0.5.1.
Feb 22 2014 - added ksc.
Feb 27 2014 - added RH2 20Feb2014.
Mar 02 2014 - added zling (libzling) 20140219.
Mar 10 2014 - added tornado 0.6.
Mar 15 2014 - added freearc 0.67a.
Mar 23 2014 - added RH4_x64 22Mar2014.
Mar 24 2014 - added libzling 20140324.
Mar 25 2014 - added ppmx 0.09, zpaq 6.50.
Apr 01 2014 - added tree 0.1.
Apr 02 2014 - updated tree 0.1.
Apr 04 2014 - updated tree 0.1.
Apr 14 2014 - added libzling 20140414.
Apr 16 2014 - added cmix v1.
Apr 28 2014 - added tree 0.3.
Apr 29 2014 - added RH4 24Apr2014.
May 04 2014 - added zcm 0.90.
May 05 2014 - added zling (libzling) 20140430-bugfix.
May 12 2014 - updated gzip124hack description and link.
May 16 2014 - added zcm 0.92.
May 27 2014 - added tree 0.4, tree 0.5.
May 29 2014 - added cmix v2.
Jun 02 2014 - added lza 0.01.
Jun 18 2014 - added paq8pxd_v8.
Jun 27 2014 - added cmix v3.
Jun 29 2014 - added paq8pxd_v10.
Jun 30 2014 - added lza 0.10.
Jul 05 2014 - added lza_x64 0.10.
Jul 06 2014 - added tree 0.9.
Jul 07 2014 - added zcm_x64 0.92.
Jul 09 2014 - updated zcm_x64 0.92.
Jul 23 2014 - added cmix v4.
Jul 27 2014 - updated st (obsolete).
Jul 31 2014 - added paq8pxd_v12.
Aug 08 2014 - added lzturbo 1.2.
Aug 11 2014 - updated lzturbo 1.2 (levels 3x).
Aug 13 2014 - added paq8pxd_v12-skbuild, cmix v5.
Aug 15 2014 - updated cmix (typo).
Aug 17 2014 - added tree v10.0, paq8pxd_v12-skbuild.
Aug 18 2014 - updated tree v0.10.
Aug 22 2014 - updated paqp8xd_v12-skbuild description.
Aug 28 2014 - added paq8pxd_v13_x64.
Sep 03 2014 - added tree v0.11, cmix v6.
Sep 07 2014 - updated tree v0.11.
Sep 08 2014 - updated tree v0.11.
Sep 09 2014 - added lza 0.51.
Sep 10 2014 - added lza_x64 0.51.
Sep 11 2014 - added xeloz 0.3.5.3.
Sep 12 2014 - added xeloz 0.3.5.3a.
Sep 14 2014 - removed st at request of author.
Sep 18 2014 - updated stuffit link.
Sep 19 2014 - added paq8pxd_v15.
Sep 23 2014 - updated paq8pxd_v15 for enwik9.
Oct 04 2014 - added tree 0.12.
Oct 06 2014 - added lzf 1.02.
Oct 13 2014 - added tree 0.13.
Oct 14 2014 - updated main table typo (libzling).
Oct 18 2014 - added lza 0.61.
Oct 20 2014 - added lza 0.62.
Oct 28 2014 - added paq8pxd_v12_biondivers1_x64.
Oct 31 2014 - added tree 0.14.
Nov 13 2014 - added rh5.
Nov 20 2014 - added lza 0.70b.
Nov 22 2014 - added tree 0.15a.
Nov 23 2014 - updated tree 0.15a.
Dec 09 2014 - added tree 0.16b.
Dec 12 2014 - updated tree 0.16b.
Dec 16 2014 - added tree 0.17.
Dec 18 2014 - updated tree 0.17.
Jan 11 2015 - added lza 0.80test.
Jan 19 2015 - added tree 0.18.
Jan 25 2015 - added zstd.
Jan 26 2015 - added lzhamtest (lzham) v1.0.
Feb 04 2015 - added tree 0.19.
Feb 05 2015 - added cmix v7, mcm 0.8.
Feb 09 2015 - added pcompress 3.1.
Mar 03 2015 - added bcm 1.00, mcm 0.82.
Mar 04 2015 - updated bcm 1.00.
Mar 06 2015 - added balz 1.20.
Mar 10 2015 - added lza 0.82b.
Mar 18 2015 - added bce3.
Mar 23 2015 - added csarc 3.3.
Apr 22 2015 - added mcm 0.83.
Apr 24 2015 - added xz 5.2.1.
Apr 27 2015 - added glza 0.1 (formerly tree).
Apr 28 2015 - corrected Pareto frontier in main table.
Apr 29 2015 - corrected Pareto frontier for lzham.
May 13 2015 - added zcm 0.93.
May 27 2015 - added glza 0.2.
May 28 2015 - added rings 2.1 and 2.2.
Jun 08 2015 - added rings 2.5.
Jul 13 2015 - added glza 0.3.
Jul 20 2015 - added packet 1.2.
Sep 15 2015 - added cmv 00.01.00, updated nanozip 0.09a.
Sep 16 2015 - updated cmv 00.01.00, nanozip 0.09a.
Sep 23 2015 - added brotli 21 Sep 2015.
Sep 25 2015 - added brieflz 1.1.0.
Nov 11 2015 - added cmix v8.
Nov 18 2015 - added glza 0.3b.
Dec 04 2015 - added zstd 0.4.0, 0.4.2.
Dec 05 2015 - updated zstd 0.4.2.
Dec 06 2015 - added zstd_no_legacy 0.4.2.
Jan 05 2016 - added lz5 1.3.3.
Feb 09 2016 - added libzling 20160107, lz4opt 1.00.
Feb 18 2016 - added zstd 0.5.1, brotli 18-Feb-2016.
Feb 20 2016 - updated brotli 18-Feb-2016.
Mar 10 2016 - added emma 0.1.3.
Mar 11 2016 - added glza 0.4.
Mar 12 2016 - updated glza 0.4.
Mar 14 2016 - added emma 0.1.4.
Mar 20 2016 - added cmv 00.01.01.
Mar 30 2016 - updated cmv 00.01.01.
Apr 08 2016 - added lz4x 1.02.
Apr 13 2016 - added zstd 0.6.0.
Apr 15 2016 - added cmix v9.
May 05 2016 - updated link to m1x2.
Jun 17 2016 - added cmix v10.
Jun 28 2016 - added ulz 0.03.
Jul 01 2016 - added plzip 1.5 (lzip).
Jul 07 2016 - added cmix v11.
Jul 19 2016 - added emma 0.1.12.
Aug 08 2016 - added paq8pxd_v18.
Aug 12 2016 - updated paq8pxd_v18.
Aug 23 2016 - added emma 0.1.16.
Aug 24 2016 - updated emma 0.1.16.
Aug 29 2016 - corrected emma 0.1.16 version to 0.1.6.
Sep 05 2016 - added packet 1.9.
Sep 27 2016 - added glza 0.8.
Nov 08 2016 - added cmix v12.
Apr 25 2017 - added cmix v13.
Apr 28 2017 - added emma 0.1.22.
Jun 27 2017 - added lstm-compress.
Jul 13 2017 - added ulz 0.06.
Jul 19 2017 - added paq8px_v77.
Sep 24 2017 - added emma 1.23, paq8pxd_v32, paq8px_v96.
Nov 23 2017 - added cmix v14.
Dec 14 2017 - added lstm-compress (cmix).
Jan 05 2018 - added phda9 1.0, cmve 0.2.0.
Feb 01 2018 - moved lstm-compress to own section.
Mar 28 2018 - added phda9 1.2.
Apr 30 2018 - added phda9 1.3.
May 20 2018 - added cmix v15, phda9 1.4.
Aug 09 2018 - added phda9 1.5, paq8pxd_v47, glza 0.10.1, fixed cmve 0.2.0.
Oct 11 2018 - added cmix v16.
Oct 25 2018 - added phda9 1.6.
Oct 26 2018 - typo.
Feb 22 2019 - added phda9 1.7.
Mar 27 2019 - added cmix v17.
Apr 01 2019 - added lstm-compress v3.
May 10 2019 - added nncp 2019-05-08.
May 11 2019 - updated nncp 2019-05-08.
Jul 09 2019 - added phda9 1.8.
Jul 25 2019 - replaced links to encode.ru to encode.su throughout. Added paq8pxd_v48_bwt1, paq8pxd_v61.
Aug 07 2019 - added cmix v18.
Aug 10 2019 - added nakamichi 2019-Jul-01.
Aug 12 2019 - added HP_2017_October.rar (2017 Hutter prize winner) under phda9.
Nov 19 2019 - added nncp 2019-11-16.
Mar 09 2020 - updated description of Hutter prize.
Jul 21 2020 - added tensorflow-compress v1.
Sep 09 2020 - added tensorflow-compress v2.
Dec 01 2020 - added tensorflow-compress v3.
Jan 10 2021 - added nncp v2.
Jan 12 2021 - updated tensorflow-compress.
Feb 06 2021 - added nncp v2.1.
Apr 26 2021 - added nncp v3.
Jun 14 2021 - added starlit.
Jun 21 2021 - added cmix-hp v1.
Aug 30 2021 - added cmix-hp v2, v3, cmix v19, nncp v3.1.
Apr 26 2022 - added nanozip 0.09a mirror.
Jul 05 2022 - added paq8px_v206fix1.
Aug 14 2022 - added tensorflow-compress v4.
Sep 10 2022 - added lzuf62, lzhhf, lzpgt.
Sep 12 2022 - fixed typos in lzuf62.
Sep 15 2022 - added lzwg.
Nov 25 2022 - updated links for compact.
Dec 02 2022 - added bsc-m03 0.4.0, bsc 3.2.5.
Feb 28 2023 - added bcm 2.03.
Jul 22 2023 - added fast-cmix-hp.
Aug 14 2023 - added lzpgt6, lzwhc.
Aug 16 2023 - updated lzwhc description.
Oct 24 2023 - added nncp v3.2.
Nov 01 2023 - added fastcmix.archive9 mirror for Hutter prize submission of fast-cmix-hp.
Nov 07 2023 - added cmix v20.
Jan 17 2024 - added fx-cmix.
Jun 04 2024 - added ghost.
Sep 17 2024 - added cmix v21.
Sep 19 2024 - updated cmix v21 (added option -t).