0

I tried to compress a string "XZJ6RTNN4NNNNNNR8YWWX7ZGWO1XXQT6PSRT5281I0WQZM75K2P3SPH81XN4M3L1WF6Q" in c#. I am using the code which is marked as answered in the "https://stackoverflow.com/questions/7343465/compression-decompression-string-with-c-sharp?rq=1" link. But I am getting compressed string greater than the input. The code which is marked as answered is not working. Please let us know how to reduce this string size.

   public static void CopyTo(Stream src, Stream dest) {
    byte[] bytes = new byte[4096];

    int cnt;

    while ((cnt = src.Read(bytes, 0, bytes.Length)) != 0) {
        dest.Write(bytes, 0, cnt);
    }
}

public static byte[] Zip(string str) 
{
    var bytes = Encoding.UTF8.GetBytes(str);

    using (var msi = new MemoryStream(bytes))
    using (var mso = new MemoryStream()) {
        using (var gs = new GZipStream(mso, CompressionMode.Compress)) {
            //msi.CopyTo(gs);
            CopyTo(msi, gs);
        }

        return mso.ToArray();
    }
}

public static string Unzip(byte[] bytes) {
    using (var msi = new MemoryStream(bytes))
    using (var mso = new MemoryStream()) {
        using (var gs = new GZipStream(msi, CompressionMode.Decompress)) {
            //gs.CopyTo(mso);
            CopyTo(gs, mso);
        }

        return Encoding.UTF8.GetString(mso.ToArray());
    }
}

static void Main(string[] args) {
    byte[] r1 = Zip("StringStringStringStringStringStringStringStringStringStringStringStringStringString");
    string r2 = Unzip(r1);
}
Community
  • 1
  • 1
Ctech
  • 111
  • 1
  • 8

3 Answers3

1

Yes, short values with high entropy commonly get larger, not smaller, when "compressing" them. This is a simple feature of how compression works. Accordingly, many protocols include an "is this compressed" flag to allow short or high-entropy payloads to be sent efficiently - sometimes by an estimator (for example, don't even try if less than 100 bytes), or sometimes by trying the compression, and then sending whichever is smaller.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Is there anyway to get the string which results smaller than the input which I have given. – Ctech Dec 04 '13 at 12:27
  • Actually What I am trying to do is generation user details from this string. User have to tell this key in offline. If the string is length is larger, then it will take time. so reducing the string length (In md5 hash ,it is one way encryption so I am not using this md5) – Ctech Dec 04 '13 at 12:36
0

I'm going to go with one of the comments on that thread:

"There is no reason to do this, and every reason not to do this. You will not save significant space, and you render your database unsearchable. Storage space is the cheapest commodity available to you. The savings for "thousands of strings" of "100 to 200 characters" is going to be insignificant, less than a megabyte. Don't do this, store your strings uncompressed."

user1666620
  • 4,800
  • 18
  • 27
0

It seems that your string may in fact be a base-64 encoded byte array.

If this is the case, then you can "compress" it by converting it back to a byte array:

string original = "XZJ6RTNN4NNNNNNR8YWWX7ZGWO1XXQT6PSRT5281I0WQZM75K2P3SPH81XN4M3L1WF6Q";
Console.WriteLine("Original #characters = " + original.Length + " characters, or byte count = " + 2*original.Length);
byte[] compressed = Convert.FromBase64String(original);
Console.WriteLine("Compressed length = " + compressed.Length);
string decompressed = Convert.ToBase64String(compressed);

if (decompressed == original)
    Console.WriteLine("Decompressed OK");
else
    Console.WriteLine("Failed to decompress!");

The output from this code is:

Original #characters = 68 characters, or byte count = 136
Compressed length = 51
Decompressed OK

So we have gone from 68 characters (or 136 bytes, if the characters are UTF16) down to 51 bytes.

Note that this isn't compressing the data at all. It's merely converting the base-64 ASCII representation back to its original format, ASSUMING that it REALLY is base-64 ASCII.

If it isn't, then clearly you can't convert it back to a byte array.

I posted this just to alert you to the fact that it may be base-64 ASCII encoded data that you are dealing with, and you should check if that is the case.

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276