3

I've just started compressing file in VB.Net, using the following code. Since I'm targeting Fx 2.0, I can't use the Stream.CopyTo method.

My code, however, gives extremely poor results compared to the gzip Normal compression profile in 7-zip. For example, my code compressed a 630MB outlook archive to 740MB, and 7-zip makes it 490MB.

Here is the code. Is there a blatant mistake (or many?)

Using Input As New IO.FileStream(SourceFile, IO.FileMode.Open, IO.FileAccess.Read, IO.FileShare.Read)
    Using outFile As IO.FileStream = IO.File.Create(DestFile)
        Using Compress As IO.Compression.GZipStream = New IO.Compression.GZipStream(outFile, IO.Compression.CompressionMode.Compress)
            'TODO: Figure out the right buffer size.'
            Dim Buffer(524228) As Byte
            Dim ReadBytes As Integer = 0

            While True
                ReadBytes = Input.Read(Buffer, 0, Buffer.Length)
                If ReadBytes <= 0 Then Exit While
                Compress.Write(Buffer, 0, ReadBytes)
            End While
        End Using
    End Using
End Using

I've tried with multiple buffer sizes, but I get similar compression times, and exactly the same compression ratio.

Clément
  • 12,299
  • 15
  • 75
  • 115

3 Answers3

5

EDIT, or actually rewrite: It looks like the BCL coders decided to phone it in.

The implementation in System.dll version 2.0 uses statically defined, hardcoded Huffman trees optimized for plain ASCII text, rather than adaptively generating the Huffman trees as other implementations do. It also doesn't support stored-block optimization (which is how standard GZip/Deflate avoid runaway expansion). As a result, running any sort of file through their implementation other than plain text will result in a much larger file than the input, and Microsoft claims this is by design!

Save yourself some pain, grab a third party implementation.

Jeffrey Hantin
  • 35,734
  • 7
  • 75
  • 94
  • I'm not using the 7-zip algorithm; I'm using the program called 7-zip (https://sourceforge.net/projects/sevenzip/) to compress to gzip. – Clément Feb 12 '11 at 00:33
  • @CFP Ah, good point. The BCL may just have a crappy implementation of GZip, then -- they certainly don't seem to give you any control over the compression parameters, and if it expands input that much it probably doesn't support the stored-block feature of deflate. Personally, I use ICSharpCode.SharpZipLib instead; it's pure verifiable managed code to boot. – Jeffrey Hantin Feb 12 '11 at 00:56
  • I've reopened a bug report, at https://connect.microsoft.com/VisualStudio/feedback/details/643239/the-gzip-deflate-implementation-is-broken . – Clément Feb 12 '11 at 09:20
1

IO.Compression wasn't really made for us. It was created the support the XPS or XML Paper Specificatin. Currently you have to use a third party library if you want decent file compression.

Jonathan Allen
  • 68,373
  • 70
  • 259
  • 447
  • 2
    How come then that the doc reads "This class represents the gzip data format, which uses an industry standard algorithm for lossless file compression and decompression."? – Clément Feb 12 '11 at 00:35
  • The documentation isn't wrong, it just doesn't tell the whole story. I too made the mistake of trying to use it before I learned the backstory. – Jonathan Allen Feb 12 '11 at 18:59
0

Some additional information that may be useful. I was compressing some static files (binary) to include in a project release and had the same issue where the file size increased with IO.Compression.GZipStream.

I decided to use Ionic.Zip instead where the best compression could be used.

One thing I noticed immediately is that even though Ionic.Zip reduced my files to 25% of there original size the Compressing Action was about 3-4 times slower (totally expected) but the unzip process was also 3 times slower which made the decompress take 1.6 seconds compared to 0.5 seconds.

Since the GZipStream is a standard, even though the built in IO.Compression.GZipStream in .NET was far less space efficient compressing, it was far faster decompressing.

So I use both Ionic.Zip Librarys "ZLib.GZipStream" to Compress the files and "IO.Compression.GZipStream" to Decompress the files much faster in production.

DarrenMB
  • 2,342
  • 1
  • 21
  • 26