HttpWebRequest & Native GZip Compression

Question

When requesting a page with Gzip compression I am getting a lot of the following errors:

System.IO.InvalidDataException: The CRC in GZip footer does not match the CRC calculated from the decompressed data

I am using native GZipStream to decompress and am looking at addressing this. With that in mind is there a work around for addressing this or another GZip library (free?) which will handle this issue properly?

I am verifying the webResponse ContentEncoding is GZIP

Update 5/11 A simplified snippit

//Caller
public void SOSampleGet(string url) 
{
    // Initialize the WebRequest.
    webRequest = (HttpWebRequest)WebRequest.Create(url);
    webRequest.Method = WebRequestMethods.Http.Get;
    webRequest.KeepAlive = true;
    webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
    webRequest.Headers.Add("Accept-Encoding", "gzip,deflate");
    webRequest.Referer = WebUtil.GetDomain(url);

    HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();    

    using (Stream stream = GetStreamForResponse(webResponse, READTIMEOUT_CONST))
    {
        //use stream
    }
}

//Method
private static Stream GetStreamForResponse(HttpWebResponse webResponse, int readTimeOut)
{
    Stream stream;
    switch (webResponse.ContentEncoding.ToUpperInvariant())
    {
        case "GZIP":
            stream = new GZipStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;
        case "DEFLATE":
            stream = new DeflateStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;

        default:
            stream = webResponse.GetResponseStream();
            stream.ReadTimeout = readTimeOut;
            break;
        }    
    return stream;
}

Is it for a specific site, or is this happening from responses everywhere? If its only one site, it could be that the problem lies on the other side. — Kladskull, May 08 '09 at 14:52
Note also that "deflate", according to the HTTP spec, is really "zlib" (which wraps deflate), and not deflate at all (it's a misnomer). Because of [this confusion](http://en.wikipedia.org/wiki/Gzip#Derivatives_and_other_uses), though, some servers will send deflate, and other zlib, and clients need to support both (by heuristic guess) just in case. Yuck. — Cameron, Jun 03 '13 at 18:30

score 141 · Answer 1 · answered Oct 15 '11 at 01:38

141

What about the webrequest AutomaticDecompression Property available since .net 2? Simply add:

webRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

It also adds the gzip,deflate to the accept encoding header.

See http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.automaticdecompression.aspx

answered Oct 15 '11 at 01:38

Eugene

10,957
20
69
97

1

How would you do this using HttpClient? – Martin Kearn Mar 07 '16 at 15:03
3

@MartinKearn, you do `HttpClientHandler handler = new HttpClientHandler(); handler.AutomaticDecompression = System.Net.DecompressionMethods.GZip | DecompressionMethods.Deflate; _client = new HttpClient(handler);` See http://stackoverflow.com/questions/20990601/decompressing-gzip-stream-from-httpclient-response I believe it requires .net 4.5. – Eugene Mar 07 '16 at 18:43

score 7 · Answer 2 · answered Jun 12 '17 at 21:08

For .NET Core things are a little more involved. A GZipStream is needed as there isn't a property (as of writing) for AutomaticCompression. See my answer here: https://stackoverflow.com/a/44508724/2421277

Code from answer:

var req = WebRequest.CreateHttp(uri);

/*
 * Headers
 */
req.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";

/*
 * Execute
 */
try
{
    using (var resp = await req.GetResponseAsync())
    {
        using (var str = resp.GetResponseStream())
        using (var gsr = new GZipStream(str, CompressionMode.Decompress))
        using (var sr = new StreamReader(gsr))

        {
            string s = await sr.ReadToEndAsync();  
        }
    }
}
catch (WebException ex)
{
    using (HttpWebResponse response = (HttpWebResponse)ex.Response)
    {
        using (StreamReader sr = new StreamReader(response.GetResponseStream()))
        {
            string respStr = sr.ReadToEnd();
            int statusCode = (int)response.StatusCode;

            string errorMsh = $"Request ({url}) failed ({statusCode}) on, with error: {respStr}";
        }
    }
}

score 2 · Answer 3 · answered May 08 '09 at 14:58

2

Are you flushing and closing the stream? Try wrapping your GZipStream with a Using Statement.

answered May 08 '09 at 14:58

Matthew Whited

22,160
4
52
69

Its wrapped in a Try/Catch/Finally calling Dispose() of the stream in the finally block. – Pat May 08 '09 at 15:54

score 2 · Answer 4 · answered May 08 '09 at 15:31

2

I found some sample code that shows the entire request/response for GZip encoded pages. It uses GZipStream.

http://www.know24.net/blog/Decompress+GZip+Deflate+HTTP+Responses.aspx

answered May 08 '09 at 15:31

Mike L

622
7
15

1

Link is broken, but I looked it up through archive.org and the basic method works great :) – Nyerguds Feb 01 '16 at 21:38
For those not familiar with archive.org, like me, the new link is: https://web.archive.org/web/20200214173529/http://www.know24.net/blog/Decompress+GZip+Deflate+HTTP+Responses.aspx – Peter Chikov Dec 17 '21 at 18:18

score 1 · Answer 5 · answered May 08 '09 at 14:54

1

See my comment above, but this usually is a symptom of a corrupted file. If the site is your own, replace the file you are trying to access.

answered May 08 '09 at 14:54

Kladskull

10,332
20
69
111

Not my site, it seems particular to a few sites I am requesting from however. – Pat May 08 '09 at 15:53

score -2 · Answer 6 · edited Jul 02 '09 at 19:12

The native GZipStream can read a compressed GZIP (RFC 1952) stream, but it can't handle the ZIP file format.

From http://www.geekpedia.com/tutorial190_Zipping-files-using-GZipStream.html:

The disadvantage of using the GZipStream class over a 3rd party product is that it has limited capabilities. One of the limitations is that you cannot give a name to the file that you place in the archive. When GZipStream compresses the file into a ZIP archive, it takes the sequence of bytes from that file and uses compression algorithms that create a smaller sequence of bytes. The new sequence of bytes is put into the new ZIP file. When you open the ZIP file you will open the archived file itself; most popular ZIP extractors (WinZip, WinRar, etc.) will show you the content of the ZIP as a file that has the same as the archive itself.

EDIT: The above note is incorrect. GZipStream does not produce a ZIP file. It is not a "Single file ZIP stream". It is a GZIP Stream. They are different things. There's no guarantee that tools that handle ZIP archives will handle a .gz file.

For an implementation that can read ZIP archives, as opposed to single-file ZIP streams, try #ziplib (SharpZipLib, formerly NZipLib).

I don't believe the original poster is talking about dealing with compressed/archived files. Rather, the use case is requesting a web page while sending an Accept-Encoding: header to the server, indicating that the client supports gzip. That header allows the server to compress the content before sending it to the client, saving bandwidth. Modern web browsers can do this, and many servers are configured to respond accordingly. — Chris W. Rea, May 08 '09 at 14:07
Are you checking if the server actually replies with a gzip stream, for example with wireshark? Wireshark can decode and verify the reply, even if it's gzipped. — Andomar, May 08 '09 at 14:43

HttpWebRequest & Native GZip Compression

6 Answers6

Linked