16

I need to compress an array of bytes. So I wrote this snippet :

 class Program
    {
        static void Main()
        {
            var test = "foo bar baz";

            var compressed = Compress(Encoding.UTF8.GetBytes(test));
            var decompressed = Decompress(compressed);
            Console.WriteLine("size of initial table = " + test.Length);
            Console.WriteLine("size of compressed table = " + compressed.Length);
            Console.WriteLine("size of  decompressed table = " + decompressed.Length);
            Console.WriteLine(Encoding.UTF8.GetString(decompressed));
            Console.ReadKey();
        }

        static byte[] Compress(byte[] data)
        {
            using (var compressedStream = new MemoryStream())
            using (var zipStream = new GZipStream(compressedStream, CompressionMode.Compress))
            {
                zipStream.Write(data, 0, data.Length);
                zipStream.Close();
                return compressedStream.ToArray();
            }
        }

        static byte[] Decompress(byte[] data)
        {
            using (var compressedStream = new MemoryStream(data))
            using (var zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
            using (var resultStream = new MemoryStream())
            {
                zipStream.CopyTo(resultStream);
                return resultStream.ToArray();
            }
        }
    }

The problem is that I get this output :

output

I don't understand why the size of the compressed array is greater than the decompressed one !

Any ideas?

Edit

after @spender's comment: if I change test string for example :

var test = "foo bar baz very long string for example hdgfgfhfghfghfghfghfghfghfghfghfghfghfhg";

I get different result. So what is the minimum size of the initial array to be compressed ?

Lamloumi Afif
  • 8,941
  • 26
  • 98
  • 191
  • 2
    Because the data is so small that the overheads of the compression format outweigh the gain of compression. Try more data. Note: completely random data will not compress. – spender Dec 01 '16 at 11:09
  • @spender plz see my edit and post your idea as an answer, thanks – Lamloumi Afif Dec 01 '16 at 11:14

2 Answers2

9

Compressed file has headers and it increases the file size, when the input size is very small the output can be even bigger as you see. try it with a file with bigger size.

Ashkan Mobayen Khiabani
  • 33,575
  • 33
  • 102
  • 171
3

This is because the amount of data is so small that the overheads of the compression format outweigh the gain of compression.

Try more data.

If you compressed entirely random data (or already compressed data such as jpeg), you would never make any significant gain. However the string new String('*',1000000) would compress down really nicely.

GZIP adds at least 18 bytes, so anything below, or marginally above this size that is easily compressible will not benefit.

Here's an interesting question that probes further into GZIP: What's the most that GZIP or DEFLATE can increase a file size?

Community
  • 1
  • 1
spender
  • 117,338
  • 33
  • 229
  • 351