63

When requesting a page with Gzip compression I am getting a lot of the following errors:

System.IO.InvalidDataException: The CRC in GZip footer does not match the CRC calculated from the decompressed data

I am using native GZipStream to decompress and am looking at addressing this. With that in mind is there a work around for addressing this or another GZip library (free?) which will handle this issue properly?

I am verifying the webResponse ContentEncoding is GZIP

Update 5/11 A simplified snippit

//Caller
public void SOSampleGet(string url) 
{
    // Initialize the WebRequest.
    webRequest = (HttpWebRequest)WebRequest.Create(url);
    webRequest.Method = WebRequestMethods.Http.Get;
    webRequest.KeepAlive = true;
    webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
    webRequest.Headers.Add("Accept-Encoding", "gzip,deflate");
    webRequest.Referer = WebUtil.GetDomain(url);

    HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();    

    using (Stream stream = GetStreamForResponse(webResponse, READTIMEOUT_CONST))
    {
        //use stream
    }
}

//Method
private static Stream GetStreamForResponse(HttpWebResponse webResponse, int readTimeOut)
{
    Stream stream;
    switch (webResponse.ContentEncoding.ToUpperInvariant())
    {
        case "GZIP":
            stream = new GZipStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;
        case "DEFLATE":
            stream = new DeflateStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;

        default:
            stream = webResponse.GetResponseStream();
            stream.ReadTimeout = readTimeOut;
            break;
        }    
    return stream;
}
Pat
  • 5,263
  • 1
  • 36
  • 53
  • Is it for a specific site, or is this happening from responses everywhere? If its only one site, it could be that the problem lies on the other side. – Kladskull May 08 '09 at 14:52
  • 1
    Note also that "deflate", according to the HTTP spec, is really "zlib" (which wraps deflate), and not deflate at all (it's a misnomer). Because of [this confusion](http://en.wikipedia.org/wiki/Gzip#Derivatives_and_other_uses), though, some servers will send deflate, and other zlib, and clients need to support both (by heuristic guess) just in case. Yuck. – Cameron Jun 03 '13 at 18:30

6 Answers6

141

What about the webrequest AutomaticDecompression Property available since .net 2? Simply add:

webRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

It also adds the gzip,deflate to the accept encoding header.

See http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.automaticdecompression.aspx

Eugene
  • 10,957
  • 20
  • 69
  • 97
  • 1
    How would you do this using HttpClient? – Martin Kearn Mar 07 '16 at 15:03
  • 3
    @MartinKearn, you do `HttpClientHandler handler = new HttpClientHandler(); handler.AutomaticDecompression = System.Net.DecompressionMethods.GZip | DecompressionMethods.Deflate; _client = new HttpClient(handler);` See http://stackoverflow.com/questions/20990601/decompressing-gzip-stream-from-httpclient-response I believe it requires .net 4.5. – Eugene Mar 07 '16 at 18:43
7

For .NET Core things are a little more involved. A GZipStream is needed as there isn't a property (as of writing) for AutomaticCompression. See my answer here: https://stackoverflow.com/a/44508724/2421277

Code from answer:

var req = WebRequest.CreateHttp(uri);

/*
 * Headers
 */
req.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";

/*
 * Execute
 */
try
{
    using (var resp = await req.GetResponseAsync())
    {
        using (var str = resp.GetResponseStream())
        using (var gsr = new GZipStream(str, CompressionMode.Decompress))
        using (var sr = new StreamReader(gsr))

        {
            string s = await sr.ReadToEndAsync();  
        }
    }
}
catch (WebException ex)
{
    using (HttpWebResponse response = (HttpWebResponse)ex.Response)
    {
        using (StreamReader sr = new StreamReader(response.GetResponseStream()))
        {
            string respStr = sr.ReadToEnd();
            int statusCode = (int)response.StatusCode;

            string errorMsh = $"Request ({url}) failed ({statusCode}) on, with error: {respStr}";
        }
    }
}
pim
  • 12,019
  • 6
  • 66
  • 69
2

Are you flushing and closing the stream? Try wrapping your GZipStream with a Using Statement.

Matthew Whited
  • 22,160
  • 4
  • 52
  • 69
  • Its wrapped in a Try/Catch/Finally calling Dispose() of the stream in the finally block. – Pat May 08 '09 at 15:54
2

I found some sample code that shows the entire request/response for GZip encoded pages. It uses GZipStream.

http://www.know24.net/blog/Decompress+GZip+Deflate+HTTP+Responses.aspx

Mike L
  • 622
  • 7
  • 15
  • 1
    Link is broken, but I looked it up through archive.org and the basic method works great :) – Nyerguds Feb 01 '16 at 21:38
  • For those not familiar with archive.org, like me, the new link is: https://web.archive.org/web/20200214173529/http://www.know24.net/blog/Decompress+GZip+Deflate+HTTP+Responses.aspx – Peter Chikov Dec 17 '21 at 18:18
1

See my comment above, but this usually is a symptom of a corrupted file. If the site is your own, replace the file you are trying to access.

Kladskull
  • 10,332
  • 20
  • 69
  • 111
-2

The native GZipStream can read a compressed GZIP (RFC 1952) stream, but it can't handle the ZIP file format.

From http://www.geekpedia.com/tutorial190_Zipping-files-using-GZipStream.html:

The disadvantage of using the GZipStream class over a 3rd party product is that it has limited capabilities. One of the limitations is that you cannot give a name to the file that you place in the archive. When GZipStream compresses the file into a ZIP archive, it takes the sequence of bytes from that file and uses compression algorithms that create a smaller sequence of bytes. The new sequence of bytes is put into the new ZIP file. When you open the ZIP file you will open the archived file itself; most popular ZIP extractors (WinZip, WinRar, etc.) will show you the content of the ZIP as a file that has the same as the archive itself.


EDIT: The above note is incorrect. GZipStream does not produce a ZIP file. It is not a "Single file ZIP stream". It is a GZIP Stream. They are different things. There's no guarantee that tools that handle ZIP archives will handle a .gz file.


For an implementation that can read ZIP archives, as opposed to single-file ZIP streams, try #ziplib (SharpZipLib, formerly NZipLib).

Cheeso
  • 189,189
  • 101
  • 473
  • 713
Andomar
  • 232,371
  • 49
  • 380
  • 404
  • 1
    I don't believe the original poster is talking about dealing with compressed/archived files. Rather, the use case is requesting a web page while sending an Accept-Encoding: header to the server, indicating that the client supports gzip. That header allows the server to compress the content before sending it to the client, saving bandwidth. Modern web browsers can do this, and many servers are configured to respond accordingly. – Chris W. Rea May 08 '09 at 14:07
  • 1
    Are you checking if the server actually replies with a gzip stream, for example with wireshark? Wireshark can decode and verify the reply, even if it's gzipped. – Andomar May 08 '09 at 14:43