78

I have a large amount of data to move using two PHP scripts: one on the client side using a command line PHP script and other behind Apache. I POST the data to the server side and use php://input stream to save it on the web-server end. To prevent from reaching any memory limits, data is separated into 500kB chunks for each POST request. All this works fine.

Now, to save the bandwidth and speed it up, I want to compress the data before sending and decompress when received on the other end. I found 3 pairs of functions that can do the job, but I cannot decide which one to use:

Which pair of functions would you recommend and why?

UPDATE: I just read zlib FAQ:

The gzip format (gzencode) was designed to retain the directory information about a single file, such as the name and last modification date. The zlib format (gzcompress) on the other hand was designed for in-memory and communication channel applications, and has a much more compact header and trailer and uses a faster integrity check than gzip.

Mikko Rantalainen
  • 14,132
  • 10
  • 74
  • 112
Milan Babuškov
  • 59,775
  • 49
  • 126
  • 179

4 Answers4

108

All of these can be used. There are subtle differences between the three:

  • gzencode() uses the GZIP file format, the same as the gzip command line tool. This file format has a header containing optional metadata, DEFLATE compressed data, and footer containing a CRC32 checksum and length check.
  • gzcompress() uses the ZLIB format. It has a shorter header serving only to identify the compression format, DEFLATE compressed data, and a footer containing an ADLER32 checksum.
  • gzdeflate() uses the raw DEFLATE algorithm on its own, which is the basis for both of the other formats.

All three use the same algorithm under the hood, so they won't differ in speed or efficiency. gzencode() and gzcompress() both add a checksum, so the integrity of the archive can be verified, which can be useful over unreliable transmission and storage methods. If everything is stored locally and you don't need any additional metadata then gzdeflate() would suffice. For portability I'd recommend gzencode() (GZIP format) which is probably better supported than gzcompress() (ZLIB format) among other tools.

When compressing very short strings the overhead of each method becomes relevant since for very short input the overhead can comprise a significant part of the output. The overhead for each method, measured by compressing an empty string, is:

  • gzencode('') - 20 bytes
  • gzcompress('') - 8 bytes
  • gzdeflate('') - 2 bytes
thomasrutter
  • 114,488
  • 30
  • 148
  • 167
  • 3
    Almost correct. I investigated a little bit, and it seems gzencode is not wihtout any header data - it just has different header data. – Milan Babuškov Mar 08 '09 at 08:57
  • 3
    @Milan I guess you meant "gzcompress is not without any header data - it just has different header data". – thomasrutter Mar 08 '09 at 11:12
49

I am no PHP expert and cannot answer the question posed, but it seems like there is a lot of guessing going on here, and fuzzy information being proffered.

DEFLATE is the name of the compression algorithm that is used by ZLIB, GZIP and others. In theory, GZIP supports alternative compression algorithms, but in practice, there are none.

There is no such thing as "the GZIP algorithm". GZIP uses the DEFLATE algorithm, and puts framing data around the compressed data. With GZIP you can add things like the filename, the time of the file, a CRC, even a comment. This metadata is optional, though, and many gzippers just omit it.

ZLIB is similar, except with a different, more limited set of metadata, and a specific 2-byte header.

This is all in IETF RFCs 1950, 1951, and 1952.

To say that "the gzip algorithm compresses better than DEFLATE" is just nonsense. There is no gzip algorithm. And the algorithm used in the GZIP format is DEFLATE.

Cheeso
  • 189,189
  • 101
  • 473
  • 713
8

All methods are essentially the same, the difference between them is mostly in the headers. personally I'd use gzencode, this will produce output which is equal to a commandline invocation to the gzip utility.

Jan Jungnickel
  • 2,084
  • 14
  • 13
-1

100x loop

   <?php
        function tenc1($x,$s){
            do{$s=gzencode($s,9);}while(--$x);
            return chunk_split(base64_encode($s));
        }
        function tenc2($x,$s){
            do{$s=gzcompress($s,9);}while(--$x);
            return chunk_split(base64_encode($s));
        }
        function tenc3($x,$s){
            do{$s=gzdeflate($s,9);}while(--$x);
            return chunk_split(base64_encode($s));
        }
    
    $string=str_repeat(str_shuffle(implode('',array_merge(range('0','9'),range('a','z'),range('A','Z')))),200000);
    echo'gzencode '.strlen(tenc1(100,$string)).PHP_EOL;
    echo'gzcompress '.strlen(tenc2(100,$string)).PHP_EOL;
    echo'gzdeflate '.strlen(tenc3(100,$string)).PHP_EOL;

Result for PHP 7.4.33
gzencode 3204
gzcompress 1712
gzdeflate 904

https://onlinephp.io/c/674e5

  • This test is not testing anything relevant. Once you compress something, there is nothing to be gained by then re-compressing the output with the same algorithm many times. Each iteration, gzencode and gzcompress will be adding its header and footer and this is the explanation of why the sizes of those increase with repeated iterations. Nobody would ever do this in the real world. – thomasrutter Jul 25 '23 at 23:45