16

I'm using php's file_get_contents() function to do a HTTP request. To save bandwidth I decided to add the "Accept-Encoding: gzip" header using stream_context_create().

Obviously, file_get_contents() outputs a gzip encoded string so I'm using gzuncompress() to decode the encoded string but I get an error with data passed as argument.

[...] PHP Warning: gzuncompress(): data error in /path/to/phpscript.php on line 26

I know there is another function able to decompress gzipped data gzdecode() but it isn't included in my PHP version (maybe it is only available on SVN).

I know that cUrl decodes gzip stream on the fly (without any problem) but someone suggested me to use file_get_contents() instead of cUrl.

Do you know any other way to decompress gzipped data in PHP or why gzuncompress() outputs a Warning? It is absurd that gzuncompress() doesn't work as expected.

Notes: The problem is certainly about PHP: the HTTP request is made to Tumblr API that give a well-encoded response.

hakre
  • 193,403
  • 52
  • 435
  • 836
Fabio Buda
  • 769
  • 2
  • 7
  • 16
  • Do you know why they suggested to use `file_get_contents` instead of cUrl? – Jonathan Jan 17 '12 at 14:08
  • No, I don't know, they said "it's better". I can go back to cUrl but I'm anyway curious about gzuncompress() issue. – Fabio Buda Jan 17 '12 at 14:13
  • Is it because the data is base64 encoded too? – Paul Bain Jan 17 '12 at 14:33
  • Are you sure `file_get_contents` isn't doing the decompression for you? It's a long shot, I know... Try dumping the contents of the file and checking for the gzip magic number `0x1f8b` at the start of the file. – Jonathan Jan 17 '12 at 14:35
  • No, even added base64_decode() I get the same error. – Fabio Buda Jan 17 '12 at 14:36
  • Just to confirm, can you please include your code in the question - im doing this very thing and see no problems – Manse Jan 17 '12 at 14:37
  • Jonathan, printing out directly from file_get_contents() I get unreadable binary data, something like this: �Ž{�F�&�W���,A��]�x��X�o��x2��ӂ`��0.� – Fabio Buda Jan 17 '12 at 14:41
  • sorry sorry sorry... I didn't convert binary data to hex. Effectively the string starts with 1f8b08... so, what to do? – Fabio Buda Jan 17 '12 at 14:43

3 Answers3

34

Found this working for me: http://www.php.net/manual/en/function.gzdecode.php#106397

Optionally try: http://digitalpbk.com/php/file_get_contents-garbled-gzip-encoding-website-scraping

if ( ! function_exists('gzdecode'))
{
    /**
     * Decode gz coded data
     * 
     * http://php.net/manual/en/function.gzdecode.php
     * 
     * Alternative: http://digitalpbk.com/php/file_get_contents-garbled-gzip-encoding-website-scraping
     * 
     * @param string $data gzencoded data
     * @return string inflated data
     */
    function gzdecode($data) 
    {
        // strip header and footer and inflate

        return gzinflate(substr($data, 10, -8));
    }
}
Mike
  • 1,883
  • 3
  • 23
  • 17
14

gzuncompress won't work for the gzip encoding. It's the decompression function for the .Z archives.

The manual lists a few workarounds for the missing gzdecode()#82930, or just use the one from upgradephp, or the gzopen temp file workaround.

Another option would be forcing the deflate encoding with the Accept-Encoding: header and then using gzinflate() for decompression.

mario
  • 144,265
  • 20
  • 237
  • 291
  • I'm developing an open source library so I can't force users to install upgradephp. The library installation should be as simple as possible so I'm turning to use cUrl that has built-in gzip support. – Fabio Buda Jan 18 '12 at 11:17
  • Uhm, what? You could just copy and paste that single function implementation out, if you don't want to ship the whole upgradephp snippet along. – mario Jan 18 '12 at 16:54
  • Mario, thanks for your suggestion but I've just turned to cUrl leaving file_get_contents() implementation. Are you sure that importing gzdecode() from upgradephp will work withouth any other dependency? – Fabio Buda Jan 19 '12 at 06:22
  • 1
    Using upgrade.php worked. Like a javascript shim for HTML5. Perfect! Thanks. – Sultan Shakir Jan 21 '13 at 17:00
  • That upgradephp project is awesome. I added a Github clone for it: https://github.com/Polycademy/upgradephp – CMCDragonkai Oct 31 '13 at 04:57
1

Before decomress data you need to assemble it. So if header contains

Transfer-Encoding: chunked

you need to unchank it.

function http_unchunk($data) {
    $res=[];
    $p=0; $n=strlen($data);
    while($p<$n) {
        if (preg_match("/^([0-9A-Fa-f]+)\r\n/",substr($data,$p,18),$m)) {
            $sz=hexdec($m[1]); $p+=strlen($m[0]);
            $res[]=substr($data,$p,$sz); $p+=$sz+2;
        } else {
            break;
        }
    }
    return implode('',$res);
}

if Content-Encoding is gzip or x-gzip or x-compress use gzdecode if Content-Encoding is deflate use gzdeflate

...
if ($chunked) $body=http_unchunk($body);
if ($gzip) $body=gzdecode($body);
if ($deflate) $body=gzdeflate($body);
...
kovserg
  • 86
  • 1
  • 1