6

PHP has its own function to work with gzip archives. I wrote the following code:

error_reporting(E_ALL);
$f = file_get_contents('http://spiderbites.nytimes.com/sitemaps/www.nytimes.com/sitemap.xml.gz');
echo $f;
$f = gzuncompress($f);
echo "<hr>";
echo $f;

First echo normally outputs the compressed file with proper header (at least first two bytes are correct). If I'd download this file with my browser I can unzip it easily.

However gzuncompress thrown Warning: gzuncompress(): data error in /home/path/to/script.php on line 5

Can anyone point me to the right direction to solve this problem?

EDIT:

The part of phpinfo() output

enter image description here

Vlada Katlinskaya
  • 991
  • 1
  • 10
  • 26
  • You could use ob_gzhandler() to let PHP do the work for you using because the function gzuncompress() will return an error if the uncompressed data is more than 32768 times the length of the compressed input data or more than the optional parameter length. Or you can use the optional parameter to define the length. That should be an easy fix. – unixmiah Dec 29 '14 at 16:37
  • @unixmiah in my case size of the file is 4169 (compressed) / 88406 (uncompressed) so the ratio is far from 32768 (real ratio is about 21). – Vlada Katlinskaya Dec 29 '14 at 16:43
  • have you edited the php.ini and enabled the zip libraries? if you haven't do so and restart your web server after doing that to take the changes you've made. – unixmiah Dec 29 '14 at 16:47
  • @unixmiah I included phpinfo() screenshot to make this point clear. As I can see - everything is enabled. Right? – Vlada Katlinskaya Dec 29 '14 at 16:55
  • 1
    @VladaKatlinskaya: Look at mario's answer. – GiamPy Dec 29 '14 at 16:56
  • aah that's the issue, the streams are incorrectly labeled – unixmiah Dec 29 '14 at 17:17

2 Answers2

10

Or you could just use the right decompression function, gzdecode().

mario
  • 144,265
  • 20
  • 237
  • 291
  • Tested: it worked! Thank you very much! Do you know why `gzuncompress()` function can behaves in this way? – Vlada Katlinskaya Dec 29 '14 at 16:58
  • 6
    @VladaKatlinskaya "gzuncompress" works on the so called "zlib" format. Whereas "gzdecode" decodes gzip-wrapped data. And "gzinflate" would be just the raw DEFLATE algorithm data. Basically each of `gzdecode` > `gzuncompress` > `gzinflate` adds a little more meta data to streams. – mario Dec 29 '14 at 17:00
  • Notably there is some confusion due to early MSIE/IIS bugs, in that some servers send "zlib" streams incorrectly labeled as "gzip" data. So you'll sometimes have to probe with gzuncompress AND gzdecode. Which is why you may prefer to use PHPs `curl` functions even, which handle this automatically. – mario Dec 29 '14 at 17:03
  • 1
    Actually those bugs caused raw deflate to be sent as zlib streams. There were no incorrectly tagged gzip streams. – Mark Adler Dec 29 '14 at 19:34
6

Note that gzuncompress() may not decompress some compressed strings and return a Data Error.

The problem could be that the outside compressed string has a CRC32 checksum at the end of the file instead of Adler-32, like PHP expects.

(http://php.net/manual/en/function.gzuncompress.php#79042)

That could be an option of why it does not work.

Try with his code:

function gzuncompress_crc32($data) {
     $f = tempnam('/tmp', 'gz_fix');
     file_put_contents($f, "\x1f\x8b\x08\x00\x00\x00\x00\x00" . $data);
     return file_get_contents('compress.zlib://' . $f);
}

Modify your code in this:

error_reporting(E_ALL);
$f = file_get_contents('http://spiderbites.nytimes.com/sitemaps/www.nytimes.com/sitemap.xml.gz');
echo $f;
$f = gzuncompress_crc32($f);
echo "<hr>";
echo $f;

As far as I have tested locally, it does not give the error anymore.

Community
  • 1
  • 1
GiamPy
  • 3,543
  • 3
  • 30
  • 51