This benchmark tests generic compression capabilities of general purpose data compression programs. Programs designed to compress data from unknown sources will do better than programs tuned for specific benchmarks or for specific file types such as text or images.
Test data: one million bit strings output by random time-bound Turing machines generated by a cryptographic process, packed into bytes (MSB first) and NUL terminated. Average size 6.5 MB. Programs may be tested on different files resulting in a average error of less than 0.0005 (or less if averaged over several tests). Results (output size) are expressed as a ratio to ppmonstr J with no options. Test data generated by uiq2.
Size Program Options Date tested Tested by ------ ------- ------- ----------- ------------ 0.8750 uiq2_t4b|zpaq1.04 cmax_o2_msr.cfg Sep 22 2009 pat357 0.8753 uiq2_t4b|zpaq1.04 cmax_o2.cfg Sep 22 2009 pat357 0.8761 uiq2_t4b|zpaq1.04 cmax_o2_msr.cfg Sep 22 2009 Mike Russell 0.8761 uiq2_t4b|zpaq1.04 cmax_o2_msr.cfg Sep 22 2009 Matt Mahoney 0.8762 uiq2_t4b|bwmonstr0.02 Sep 22 2009 pat357 0.8763 uiq2_t4b|zpaq cmax_o2.cfg (CM 177 MB) Sep 18 2009 Matt Mahoney 0.8770 uiq2_t4b|paq8px v64 -8 Sep 18 2009 pat357 0.8771 uiq2_t4|paq8k3 -8 (CM 1460 MB) Sep 17 2009 Dariusz Barylak 0.8773 uiq2_t4|zpaq cmax.cfg (CM 278 MB) Sep 17 2009 Jan Ondrus 0.8775 uiq2_t4b|bbb cfk3000 (BWT 15 MB) Sep 18 2009 pat357 0.8775 uiq2_t4b|bwtmix v1 Sep 18 2009 pat357 0.8776 uiq2_t4b|paq8pf -8 (CM) Sep 18 2009 pat357 0.8777 uiq2_t4b|bwtmix v0 Sep 18 2009 pat357 0.8780 uiq2_t4b|bliz Sep 18 2009 pat357 0.8781 uiq2_t4b|zpaq cmax_o1.cfg (CM 14 MB) Sep 18 2009 Matt Mahoney 0.8784 uiq2_t4b|paq8pfb2 -8 (CM 1165 MB) Sep 17 2009 LovePimple 0.8788 uiq2_t4b|bcm 9 Sep 18 2009 pat357 0.8794 uiq2_t4b|nanozip7 -cc -m1.4g (CM 1400 MB) Sep 18 2009 pat357 0.8804 uiq2_t4b|paq9a -9 (CM 1585 MB) Sep 18 2009 pat357 0.8805 uiq2_t4b|paq9a -5 (CM) Sep 18 2009 pat357 0.8810 uiq2_t4b|bwt|o2rc Sep 22 2009 pat357 0.8813 uiq2_t3|paq8k3 -8 (CM 1460 MB) Sep 16 2009 Dariusz Barylak 0.8815 uiq2_t4b|bwt|o19f Sep 22 2009 pat357 0.8818 uiq2_t3|zpaq cmax.cfg (CM 278 MB) Sep 16 2009 Jan Ondrus 0.8821 uiq2_t3|paq8pf b2 -8 (CM 1165 MB) Sep 16 2009 LovePimple 0.8836 uiq2_t4b|cmm 0.2b -47 Sep 22 2009 pat357 0.8855 uiq2_t4b|nanozip7 -cO -m50m (BWT 50 MB) Sep 18 2009 pat357 0.8912 uiq2_t4b|rings15c -9 Sep 22 2009 pat357 0.8974 uiq2_t4b|ccmx 1.26b -7 Sep 22 2009 pat357 0.8983 uiq2_t4b|ppmonstr (PPM 256 MB) Sep 18 2009 pat357 0.8983 uiq2_t4b|ccmx 1.30c -7 Sep 22 2009 pat357 0.8983 uiq2_t4b|grzip -b3m -m3 -a -p Sep 18 2009 pat357 0.9050 uiq2_t4b|7zip ppmd:o6 Sep 18 2009 pat357 0.9074 uiq2_t4b|uha mx Sep 18 2009 pat357 0.9247 paq8k3 -9u (CM 1460 MB) Sep 14 2009 Matt Mahoney 0.9267 paq8k3 -8u (CM 750 MB) Sep 13 2009 Bill Pettis 0.9299 uiq2_t|paq8px v60 -8 (CM 1104 MB) Sep 14 2009 Matt Mahoney 0.9373 uiq2_t|zpaq 1.03 cmax2.cfg (CM 281 MB) Sep 14 2009 Jan Ondrus 0.9399 uiq2_t|paq8pf b1 -8 (CM) Sep 14 2009 LovePimple 0.9455 paq8k3 -9 (CM 1460 MB) Sep 13 2009 Matt Mahoney 0.9460 paq8k2 -8 (CM 1504 MB) Mar 13 2009 Matt Mahoney 0.9460 paq8k2 -8 (CM 1504 MB) Jul 20 2009 Dariusz Barylak 0.9490 paq8k3 -8 (CM 756 MB) Sep 13 2009 Matt Mahoney 0.9529 zpaq 1.03 cuiq2_ver2.cfg (CM 1141MB) Sep 11 2009 Mike Russell 0.9537 paq8k2 -7 (CM ~800 MB) Jul 20 2009 Dariusz Barylak 0.9550 zpaq 1.03 cu2.cfg (CM 1129 MB) Sep 09 2009 Matt Mahoney 0.9618 paq8k -8 (CM 1675 MB) Dec 26 2008 Dariusz Barylak 0.9639 paq8kx v1 -8 (CM 1730 MB) Jul 17 2009 Matt Mahoney 0.9654 paq8k -7 (CM 837 MB Dec 26 2008 Dariusz Barylak 0.9702 paq8kx v1 -7 (CM ~800 MB) Jul 19 2009 Dariusz Barylak 0.9704 zpaq 1.02 cu1.cfg (CM 1126 MB) Sep 07 2009 Matt Mahoney 0.9704 paq8px v40 -8 (CM 1591 MB) May 30 2009 Matt Mahoney 0.9704 paq8px v46 -8 (CM 1591 MB) Jun 02 2009 Matt Mahoney 0.9704 paq8px v60 turbo -8 (CM 1591 MB) Jul 17 2009 Matt Mahoney 0.9704 paq8q v14 intel -8 (CM 1591 MB) Jul 17 2009 Matt Mahoney 0.9712 paq8o8z -8 (CM 1675 MB) Dec 27 2008 Matt Mahoney 0.9719 paq8p3 -8 (CM ~800 MB) Apr 17 2009 Matt Mahoney 0.9747 paq8px v40 -7 (CM 811 MB) May 30 2009 Matt Mahoney 0.9753 paq8p -7 (CM 837 MB) Dec 25 2008 Matt Mahoney 0.9762 uiq2_t4b|7zip lzma Sep 18 2009 pat357 0.9764 paq8q2 v4 -7 (CM 668 MB) May 30 2009 Matt Mahoney 0.9764 paq8p3 -7 (CM ~500 MB) Apr 17 2009 Matt Mahoney 0.9764 paq8p3 v2 -7 (CM ~500 MB) Apr 21 2009 Matt Mahoney 0.9783 paq8f -7 (CM 837 MB) Dec 25 2008 Matt Mahoney 0.9804 paq8kx v3 -8 (CM 1733 MB) Jul 17 2009 Matt Mahoney 0.9823 paq8hp12any -8 (CM 1800 MB) Dec 25 2008 Matt Mahoney 0.9829 ocamyd 1.66final -s0 -g1 -m3 (DMC) Jul 20 2009 Dariusz Barylak 0.9842 ocamyd 1.66test1 -s0 -m9 (DMC slowest 900 MB) Dec 25 2008 Matt Mahoney 0.9844 bwmonstr 0.02 (BWT) Jul 17 2009 Sami Runsas 0.9855 uda 0.300 (CM 180 MB) Dec 25 2008 Matt Mahoney 0.9881 nanozip 0.05a -cc -nm -m810m (CM 810 MB) Dec 25 2008 Matt Mahoney 0.9929 winrk lib2.3 -mx -M1200M Jul 20 2009 Dariusz Barylak 0.9953 lpaq8 -8 (CM 774 MB) Dec 25 2008 Matt Mahoney 0.9971 bwmonstr 0.00 (BWT 8 MB) Mar 12 2009 Matt Mahoney 0.9998 ppmonstr J -o16 (PPM order 16, 256 MB) Dec 25 2008 Matt Mahoney 0.9998 durilca 0.05 (PPM order 128, 256 MB) Dec 25 2008 Matt Mahoney 1.0000 ppmonstr J (PPM order 12, 256 MB) Dec 24 2008 Matt Mahoney 1.0021 xwrt 3.2 -l14 (CM 1500 MB) Dec 24 2008 Matt Mahoney 1.0038 slim 23d (PPM order 32) Dec 25 2008 Matt Mahoney 1.0129 ash 04a /o5 /s9 (CM o4, 9x SSE, 64MB) Dec 30 2008 Eugene Shelwien 1.0163 winrk lib2.3 -mfpw1 -M1200m Jul 20 2009 Dariusz Barylak 1.0168 lpaq1 8 (CM 771 MB) Dec 25 2008 Matt Mahoney 1.0187 paq9a -8 (CM 779 MB) Dec 25 2008 Matt Mahoney 1.0192 zpaq 1.00 cmax (CM 278 MB) Mar 12 2009 Matt Mahoney 1.0198 cmm4 v0.1e 96 (CM 768 MB+512MB window) Dec 25 2008 Matt Mahoney 1.0208 bit 0.7 -p=5 (CM 650 MB) Dec 25 2008 Osman Turan 1.0213 lpaq9l 9 (CM 1542 MB) Mar 12 2009 Matt Mahoney 1.0214 lpaq9m 9 (CM 1542 MB) Mar 12 2009 Matt Mahoney 1.0215 lpaq9l 8 (CM 774 MB) Dec 25 2008 Matt Mahoney 1.0236 zpaq v1.07 cbwt_j2.cfg,13 (BWT 143 MB) Oct 07 2009 Matt Mahoney 1.0236 zpaq v1.07 cbwt_j4.cfg,13 (BWT 143 MB) Oct 07 2009 Matt Mahoney 1.0255 bbb cfm8 (BWT 8 MB block) Dec 25 2008 Matt Mahoney 1.0271 bcm 0.08 (BWT one block) Jun 01 2009 Matt Mahoney 1.0276 ppmd J -o3 -m21 (PPM order 3, 21 MB) Dec 25 2008 Matt Mahoney 1.0280 ppmvc -o3 -m256 -r1 (PPM o3 256MB) Dec 25 2008 Matt Mahoney 1.0293 bcm 0.05 (BWT 30 MB) Mar 12 2009 Matt Mahoney 1.0295 lpaq1 6 (CM 195 MB) Dec 25 2008 Matt Mahoney 1.0308 m99 v2.2 -m 8m (BWT max 8 MB block) Dec 25 2008 Matt Mahoney 1.0337 bcm 0.03 (BWT 32 MB block) Feb 09 2009 Matt Mahoney 1.0348 pimple2 (CM 128 MB) Dec 25 2008 Matt Mahoney 1.0358 zpaq 1.00 cmid (CM 111 MB) Mar 12 2009 Matt Mahoney 1.0361 dark 0.51 -b8m (BWT 8 MB block) Dec 25 2008 Matt Mahoney 1.0390 flashzip 0.93a -m2 -s7 -b3 (ROLZ max 132MB) Mar 12 2009 Matt Mahoney 1.0394 durilca'light 0.05 (PPM order 9, 256 MB) Dec 25 2008 Matt Mahoney 1.0398 ash 06 (CM o32 104MB) Mar 12 2009 Matt Mahoney 1.0399 ctw 0.1 -d3 -n16M -f16M (CM o3 144MB) Dec 25 2008 Matt Mahoney 1.0405 sbc 0.970 r3 -m3 -b63 (BWT max 63MB block) Dec 25 2008 Matt Mahoney 1.0419 rings 1.5 c 9 (BWT 426 MB) Dec 24 2008 Matt Mahoney 1.0419 rings 1.6 c 10 (BWT 795 MB) Aug 16 2009 Matt Mahoney 1.0424 mcomp v2.00 -mw (BWT 64 MB) Dec 24 2008 Matt Mahoney 1.0455 grzipII 0.2.4 -b8m (BWT 8 MB block) Dec 25 2008 Matt Mahoney 1.0472 tc 5.2dev2 (CM 180 MB) Dec 25 2008 Matt Mahoney 1.0520 ccmx 7 (CM 1332 MB) Dec 25 2008 Matt Mahoney 1.0538 bee 0.78 b.0154 -m3 -d9 (PPM max, 1 GB) Dec 25 2008 Matt Mahoney 1.0562 pim 2.90 (PPM) Dec 31 2008 Matt Mahoney 1.0570 uhbc 1.0 -b3 -b3800k (BWT max 3.8M bl) Dec 25 2008 Matt Mahoney 1.0583 ccm 7 (CM 1330 MB) Dec 25 2008 Matt Mahoney 1.0638 chile 0.4 -b=8000 (BWT 8 MB block) Dec 25 2008 Matt Mahoney 1.0660 arbc2z (order 2) Dec 27 2008 Matt Mahoney 1.0673 uharc 0.6b -mx -md32768 (PPM max 32 MB) Dec 25 2008 Matt Mahoney 1.0770 lzturbo 0.9 -59 (LZ77 slowest 248 MB) Dec 25 2008 Matt Mahoney 1.0778 epm r9 (PPM 70 MB) Dec 25 2008 Matt Mahoney 1.0842 px 1.0 (CM 66 MB) Dec 25 2008 Matt Mahoney 1.0895 m1 0.3b bmp.txt Apr 14 2009 Matt Mahoney 1.0912 m1 0.3b text2.txt Apr 14 2009 Matt Mahoney 1.0968 m1 0.3b text.txt Apr 14 2009 Matt Mahoney 1.0980 m1 0.3b exe.txt Apr 14 2009 Matt Mahoney 1.0995 boa v0.58b -m15 (PPM 15 MB) Dec 25 2008 Matt Mahoney 1.1021 quark v0.95r -m1 -d25 -l8 (LZ max 1G slow) Dec 25 2008 Matt Mahoney 1.1056 nanozip 0.05a -co -nm -m35m (LZ77/BWT 35 MB)Dec 25 2008 Matt Mahoney 1.1008 acb 2.00 u (max compression) Jan 19 2010 Matt Mahoney 1.1127 hook 1.3 110 (LZP+DMC 110 MB) Dec 25 2008 Matt Mahoney 1.1145 tarsalzp 8.8.07 (LZP 341 MB) Dec 25 2008 Matt Mahoney 1.1145 mcomp v2.00 -mf3x -M512 (ROLZ3 max 512MB) Dec 25 2008 Matt Mahoney 1.1150 mcomp v2.00 -mfx -M512m (ROLZ max 512 MB) Dec 25 2008 Matt Mahoney 1.1160 m1 0.3 (CM 32 MB) Jan 02 2009 Matt Mahoney 1.1165 ppms J -o3 (PPM order 3) Dec 25 2008 Matt Mahoney 1.1168 rzm (ROLZ 258 MB) Dec 25 2008 Matt Mahoney 1.1202 7za (7zip) 3.11 -mx=9 (LZ77 max compression) Dec 25 2008 Matt Mahoney 1.1236 ppmx 0.07 (PPM 302 MB) Feb 23 2011 Matt Mahoney 1.1264 lzpxj 1.2h 9 (PPM 1314 MB) Dec 25 2008 Matt Mahoney 1.1265 hook 1.4 110 (LZP+DMC 110 MB) Apr 29 2009 Matt Mahoney 1.1342 fpaq2 (order 2) Dec 25 2008 Matt Mahoney 1.1393 csc32 final -m3 (best LZ77) Mar 26 2011 Matt Mahoney 1.1397 m1 0.2a (CM 32 MB) Dec 29 2008 Matt Mahoney 1.1406 flashzip 0.99b4 -m2 -c7 -b8 (ROLZ 609 MB) Aug 26 2009 Matt Mahoney 1.1414 cabarc 1.00.0601 -m lzx:21 (LZX 2 MB) Dec 25 2008 Matt Mahoney 1.1418 ppmx v0.03 (PPM 609 MB) Dec 25 2008 Matt Mahoney 1.1434 dmc 1000000000 (DMC 1 GB) Dec 25 2008 Matt Mahoney 1.1452 flashzip 0.91 -m2 -s7 -b5 (ROLZ 198 MB) Dec 25 2008 Matt Mahoney 1.1503 lzpm 0.15 9 (ROLZ max 62 MB) Dec 25 2008 Matt Mahoney 1.1508 ppmx v0.04 (PPM 230 MB) Jan 05 2009 Matt Mahoney 1.1547 winturtle 1.60 (LZP 512 MB buffer) Dec 25 2008 Matt Mahoney 1.1558 quad 1.12 -x (ROLZ max 34 MB) Dec 25 2008 Matt Mahoney 1.1570 balz 1.12 ex (ROLZ max 67 MB) Dec 25 2008 Matt Mahoney 1.1659 csc31 -m3 -d7 (LZ77 626 MB) Sep 23 2009 Matt Mahoney 1.1747 qazar 0.0pre5 -d9 -x7 -l7 (ROLZ max) Dec 25 2008 Matt Mahoney 1.1820 csc3 v2008.08.12 -m3 -d7 (LZ77 675 MB) Aug 14 2009 Matt Mahoney 1.1915 packet 0.90b -m4 -s9 (LZP method 4, max) Dec 25 2008 Matt Mahoney 1.2038 sr2 (SR 6 MB) Dec 25 2008 Matt Mahoney 1.2178 bzp 0.2 (LZP 3 MB) Dec 25 2008 Matt Mahoney 1.2206 sr3 (SR 68 MB) Dec 25 2008 Matt Mahoney 1.2294 sr3c (SR) Dec 25 2008 Matt Mahoney 1.2260 bzip2 1.0.3 -9 (BWT 1 MB) Dec 24 2008 Matt Mahoney 1.2459 thor 0.96a e4 (LZP slowest) Dec 25 2008 Matt Mahoney 1.2519 ha 0.98 a2 (PPM slow (HSC)) Dec 25 2008 Matt Mahoney 1.2654 crush 0.01 cx (LZ77 best, 143 MB) May 17 2011 Matt Mahoney 1.2752 kzip (LZ77 optimized zip) Dec 25 2008 Matt Mahoney 1.2822 csc2 (LZP 49 MB) Apr 18 2009 Matt Mahoney 1.2923 symbra 0.2 -c3 -m5 -p2 (LZP o3 112M 2pass)Dec 25 2008 Matt Mahoney 1.2953 lzc 0.08 10 (LZ77, slowest, 550 MB) Dec 25 2008 Matt Mahoney 1.2968 zpaq 1.00 cmin (LZP+o4 4 MB) Mar 12 2009 Matt Mahoney 1.3020 rar 2.50 -m5 (LZ77/BWT max comp.) Dec 24 2008 Matt Mahoney 1.3066 crush 0.01 c (LZ77 medium, 143 MB) May 17 2011 Matt Mahoney 1.3124 zip 2.32 -9 (LZ77 max compression) Dec 24 2008 Matt Mahoney 1.3126 gzip 1.3.5 -9 (LZ77 max compression) Dec 25 2008 Matt Mahoney 1.3155 gzip124hack -9 (LZ77 max compression) Dec 25 2008 Matt Mahoney 1.3199 slug 1.27b (ROLZ 14 MB) Dec 25 2008 Matt Mahoney 1.3201 lcssr 0.1 -b7 -l9 (LZP 1184 MB slowest) Dec 25 2008 Matt Mahoney 1.3300 runcoder1 (order 1) Apr 01 2009 Matt Mahoney 1.3314 tornado v0.1 (LZ77) Dec 24 2008 Matt Mahoney 1.3459 fcm1 (CM order 1) Dec 25 2008 Matt Mahoney 1.3715 lzss 0.01 ex (LZ77 max 625 MB) Dec 25 2008 Matt Mahoney 1.3854 crush 0.01 cf (LZ77 fast, 143 MB) May 17 2011 Matt Mahoney 1.3907 uiq2_t4b (transform) Sep 18 2009 pat357 1.4000 lzuf (LZ77) Apr 14 2009 Matt Mahoney 1.4217 ulz c5 (LZ77 max 43 MB) Feb 01 2010 Matt Mahoney 1.4445 lzop 1.01 -9 (LZ77 max compression) Dec 25 2008 Matt Mahoney 1.4705 mcomp 2.00 -msm (bistream MSB first 1MB) Dec 25 2008 Matt Mahoney 1.4912 slug 1.1b (LZ77 1 MB) Dec 24 2008 Matt Mahoney 1.5297 xdelta 3.0u -9 (LZ77 max, 47 MB) Jan 27 2009 Matt Mahoney 1.5761 srank 1.1 -C8 (SR 2 MB) Dec 25 2008 Matt Mahoney 1.6425 lz4 0.2 (LZSS 13 MB) Oct 16 2009 Matt Mahoney 1.6599 brieflz 1.06 (LZ77) Dec 25 2008 Matt Mahoney 1.6643 mcomp 2.00 -msl (bitstream LSB first 1MB)Dec 25 2008 Matt Mahoney 1.6728 compress (LZW) Dec 24 2008 Matt Mahoney 1.6790 lzrw3-a (ROLZ order 0 24 KB) Dec 25 2008 Matt Mahoney 1.7202 lzw 0.02 (LZW 18 MB) Dec 25 2008 Matt Mahoney 1.7435 flzp ver. 1 (bytewise LZP) Dec 24 2008 Matt Mahoney 1.7492 lzrw3 (ROLZ order 0 24 KB) Dec 25 2008 Matt Mahoney 1.7662 lzrw2 (ROLZ order 0 24 KB) Dec 25 2008 Matt Mahoney 1.7669 fastlz (LZ77) Dec 25 2008 Matt Mahoney 1.7780 fpaq0f2 (adaptive order 0) Dec 25 2008 Matt Mahoney 1.7780 snappy 1.0.1 (LZ77) Apr 27 2011 Matt Mahoney 1.7792 kwc (order 6 dict) Jan 19 2010 Matt Mahoney 1.7810 ppp (SR queue size 1) Dec 25 2008 Matt Mahoney 1.7954 lzrw1-a (LZ77 24 KB) Dec 25 2008 Matt Mahoney 1.7955 lzrw1 (LZ77 24 KB) Dec 25 2008 Matt Mahoney 1.8382 lzp2 (LZP 15 MB) Apr 17 2009 Matt Mahoney 1.8925 arb255 (stationary order 0) Dec 25 2008 Matt Mahoney 1.8927 fpaq0 (stationary order 0) Dec 24 2008 Matt Mahoney 1.8992 ntfs (LZSS) Apr 17 2009 Matt Mahoney 1.9511 barf (LZ77 bytewise len=2) Dec 24 2008 Matt Mahoney 2.0137 lzrw5 (LZW 384 KB) Dec 25 2008 Matt Mahoney 2.1401 arb2x (sta. bitwise order 0) Dec 25 2008 Matt Mahoney 2.1701 uncompressed Dec 24 2008 Matt Mahoney fails enc 0.15 ag -d256 -fc- -fe- Dec 25 2008 Matt Mahoney fails qc 0.50 -8 (max compression) Dec 25 2008 Matt Mahoney fails lzgt3a Dec 25 2008 Matt Mahoney
Sportman has also benchmarked many programs on this data, including run times.
For more information about programs, see the Large Text Compression Benchmark. Notes about compression algorithms:
To submit a result:
Notes:
The test program generator UIQ2 creates a test file which is the output of one million random Turing machines. Each machine outputs a bit string up to 256 bits. These are packed into bytes, MSB first. If the bit string length is not a multiple of 8, then the low bits of the last byte are set to 0. The end of the string is marked with a NUL (0) byte. The string itself does not contain any NUL bytes. Thus, strings can range in length from 1 to 33 bytes. The average length is about 6.5 bytes including the NUL. Each Turing machine uses the following model:
+----+----+----+----+----+----+----+----+ | c0 | c1 | w | wb | d | db | o | ob | +----+----+----+----+----+----+----+----+
Each program is initialized with L = 128 bytes of pseudo random data. UIQ2 uses a cryptographically secure pseudo random number generator. Given any part of the pseudo random sequence, it is not possible to compute the seed or any other part of the sequence any faster than brute force guessing. The seed can either be passed as an argument to UIQ2, or it can be generated automatically (only on x86 processors).
The random source is a RC4 keystream with the first 10000 bytes discarded. The seed is an arbitrary string. The RC4 key is the string including the terminating NUL byte. Thus, the seeds "a" and "aa" will generate different data.
If UIQ2 is not passed a seed, it will automatically generate one of 28 random uppercase letters, which has over 128 bits of entropy. The entropy source is CPU timing variation over several seconds as measured by the x86 RDTSC (read time stamp counter) instruction, which increments once per clock cycle (e.g. 2 GHz). Occasionally, successive differences (deltas) between RDTSC return values are unpredictably large due to task switching and delays due to cache misses influenced by other processes. This source generates a few hundred or a few thousand bits of entropy per second in multitasking systems such as Windows or Linux.
UIQ2 generates a 256 byte random key using RDTSC, and uses it to initialize RC4. Then the first 10000 bytes are discarded, and the next 28 bytes in the range 'A' to 'Z' (skipping other values) are taken as the seed. The seed is then passed to RC4 as a new key.
The original 256 byte key is generated as follows:
Initialize key[0..255] to 0 Repeat until randomness tests pass for i from 0 to 20 million r1 := r2 r2 := rdtsc() rotate k left by 1 bit (where k is key[i mod 256]) k = k XOR r1[bits 0..7] XOR r2[bits 8..15]Note that every 2048 cycles all of the key bytes are rotated back to their original positions. There are 3 tests to pass:
Generic compression is a problem of universal intelligence, which is defined as the expected reward in unknown environments having a Solomonoff distribution, i.e. described by random programs. Hutter proved that the optimal solution, called AIXI (a formalization of Occam's Razor), is to guess at each interaction with the environment that it is simulated by the simplest (shortest) program consistent with the observed interaction so far. This is a compression problem. It is also uncomputable. The purpose of this benchmark is to test how close compression programs come to this unreachable goal.
Unfortunately, sampling a Solomonoff distribution of outputs is not computable, because some Turing machines might not halt and we can't compute which ones. Furthermore, Legg proved that strings that are intrinsically hard to predict (that require complex algorithms) take too long to generate. Specifically, any string requiring an algorithm with n bits of complexity cannot be computed faster than O(BB(n)), where BB is the busy beaver function, the largest number that can be output by a program of length n. BB is not computable. (We can always give it n larger than any proposed solution). Thus, the best we can do is approximate a distribution by placing reasonable time limits on the Turing machines.
Hutter proved that the optimal computable solution to the reinforcment learning problem (of which compression is a subset) is AIXItl: Try all programs by increasing length L, testing each for time T until a match to the input is found, then predict the next bit output by that machine. The time complexity is O(T 2L). Thus, we may choose small T and L and still meet our goal of making the compression problem intractable. UIQ uses T = 256 and L ≥ 128.
Although algorithmic complexity is not computable, we can still prove that UIQ generates programs of complexity L ≥ 128 bits. It is sufficient to show that for any 128 bit string x[0..127] we can write a program with our chosen instruction set to output it. A sequence of 128 instructions of the form 1100001x[i] (where bit 0 is x[i] from our desired string) does it.
The instruction set is universal (provided the bounds of T and L are removed and the address space is expanded) because any instruction on a standard Turing machine can be simulated with 2 of these instructions. For output, 2 additional instructions may be required.
The instruction set was designed to give a wide range of string complexities. This was tested by varying the probabilities of various bits in the instructions to obtain a mix of strings with both obvious patterns and strings that compress poorly. The compression ratio is about 0.46 with ppmonstr. The odd format of the JUMP instruction was chosen to make the probability of a conditional JUMP instruction 1/8, which experimentally generates more complex string than higher values. (In fact, smaller values like 1/64 work even better).
Given a random sample of n strings, the expected error in the results is about n-1/2, or about 0.0010 for n = 106. The following experiment shows that taking the ratio of two compressors on the same file further improves accuracy. Generally, the two compressors should have about the same capability. ppmonstr was chosen to improve the accuracy of high end compressors.
In this experiment, each compressor was tested on 10 files generated with UIQ2 with 1-byte seeds "0" through "9". Results are shown below. The first 3 columns show the average size, range, and standard deviation when the compressed size is measured directly. The range is expressed as (largest/smallest - 1). The standard deviation is expressed as a fraction of the mean.
The last 3 columns show the same statistics when the compressed size is divided by the output of ppmonstr on the same file. As can be seen in the last two columns, the range (maximum error) and standard deviation are smaller in almost every case except for very poorly compressed files, and that the improvement is especially significant for the better compressors, less than half the error.
Absolute size Relative to ppmonstr ------------------------- -------------------------- Compressor Options Mean Range St Dev Mean/ppm Range St Dev ---------- ------------ ------- -------- -------- -------- -------- -------- ppmonstr J 3003424 0.001045 0.000303 1.000000 0.000000 0.000000 lpaq1 8 3054578 0.001184 0.000352 1.017032 0.000443 0.000120 paq9a -8 3059558 0.001186 0.000337 1.018690 0.000402 0.000121 bbb cfm128 3080446 0.001234 0.000360 1.025645 0.000520 0.000149 rings 1.5 c 9 3129484 0.001376 0.000370 1.041972 0.000462 0.000129 ppmd J 3256176 0.001095 0.000332 1.084155 0.000656 0.000179 nanozip 0.05a -co -nm -m1g 3320223 0.001250 0.000328 1.105479 0.000341 0.000100 bzip2 1.0.3 -9 3682430 0.001485 0.000399 1.226077 0.000798 0.000280 rar 2.50 -m5 3910482 0.000995 0.000384 1.302008 0.000991 0.000280 zip 2.32 -9 3941988 0.001125 0.000406 1.312498 0.001036 0.000354 tornado v0.1 3998999 0.001059 0.000326 1.331480 0.000712 0.000207 slug 1.1b 4478830 0.001405 0.000466 1.491241 0.001519 0.000408 compress 5024294 0.001517 0.000482 1.672855 0.001245 0.000433 flzp ver. 1 5236723 0.001904 0.000645 1.743584 0.001850 0.000529 fpaq0 5684694 0.002522 0.000751 1.892737 0.002200 0.000636 uncompressed 6517923 0.001989 0.000645 2.170164 0.002071 0.000582
Dec. 23, 2008 - Initial benchmark with generator UIQ v1 (now obsolete).
Dec. 24, 2008 - Replaced generator with UIQ2 v2 (current).
Dec. 30, 2008 - UIQ2 v2.1 fixes a compile error with some non g++ compilers, thanks to Eugene Shelwien. Executable is not changed.
This page is maintained by Matt Mahoney.