0

There is a post in here Compress and decompress string in c# for compressing string in c#.

I've implement the same code for myself but the returned text is almost twice as mine :O

I've tried it on a json with size 87 like this:

{"G":"82f88ff5-4143-46ef-86cc-a19910f4a6b5","U":"df39e3c7-ffd3-4829-a9cd-27bfcbd4403a"}

The result is 168

H4sIAAAAAAAEAC2NUQ6DIBQE5yx8l0QFqfQCnqAHqKCXaHr3jsaQ3TyYfcuXwKpeamHi0Bf9YCaSGVW6psLua5QWmifykVbPyCDJ3gube4GHet+tXZZM7Xrj6d7Z3u/W8896dVVpd5rMbCaa3k1k25M88OMPcjDew64AAAA=

I've changed Unicode to ASCII but the result is still too big (128)

H4sIAAAAAAAEAA3KyxGAMAgFwF44y0w+JAEbsAILICSvCcfedc/70EUnaYEq0FiyVJa+wdoj2LNZThDvs9FB918Xqu0ag4H1Vy3GbrG4jImYSyRVp/cDp8EZE1cAAAA=

public static string Compress(this string s)
{
    var bytes = Encoding.ASCII.GetBytes(s);
    using (var msi = new MemoryStream(bytes))
    using (var mso = new MemoryStream())
    {
        using (var gs = new GZipStream(mso, CompressionMode.Compress))
        {
           msi.CopyTo(gs);

        }
        return Convert.ToBase64String(mso.ToArray());
    }
}
Community
  • 1
  • 1
Ashkan S
  • 10,464
  • 6
  • 51
  • 80

1 Answers1

1

Gzip is not only compression but a complete file format - this means it adds additional structures which usually can be neglected regarding their size. However if compressing small strings they can blow up the overall gzip stream.

The standard GZIP header for example has 10 bytes and it's footer is 8 bytes long.

Therefore you now take your gzip compressed result in raw format (not the bloated up base64 encoded one) you will see that it has 95 bytes.

Therefore the 18 bytes for header and hooter already make nearly 20% of the output!

Robert
  • 39,162
  • 17
  • 99
  • 152
  • Thanks Robert, so what shall I do instead? – Ashkan S Nov 10 '16 at 17:06
  • Don't convert your data to a string and use it's real format instead. Your data contains for example two data that look like UUID. In string representation they are 36 characters, in binary representation (just the number) it is only 16 byte. Or try to find a compression algorithm that does not add additional data (but more than 10-15% compression you will not get as gzip shows). – Robert Nov 10 '16 at 17:11