5

Why the result of GZip algorithm is not same in Android and .Net?

My code in android:

    public static String compressString(String str) {

    String str1 = null;
    ByteArrayOutputStream bos = null;
    try {
        bos = new ByteArrayOutputStream();
        BufferedOutputStream dest = null;

        byte b[] = str.getBytes();
        GZIPOutputStream gz = new GZIPOutputStream(bos, b.length);
        gz.write(b, 0, b.length);
        bos.close();
        gz.close();

    } catch (Exception e) {
        System.out.println(e);
        e.printStackTrace();
    }
    byte b1[] = bos.toByteArray();
    return Base64.encode(b1);
}

My code in the .Net WebService:

    public static string compressString(string text)
{
    byte[] buffer = Encoding.UTF8.GetBytes(text);
    MemoryStream ms = new MemoryStream();
    using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
    {
        zip.Write(buffer, 0, buffer.Length);
    }

    ms.Position = 0;
    MemoryStream outStream = new MemoryStream();

    byte[] compressed = new byte[ms.Length];
    ms.Read(compressed, 0, compressed.Length);

    byte[] gzBuffer = new byte[compressed.Length + 4];
    System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
    System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
    return Convert.ToBase64String(gzBuffer);
}

In android:

compressString("hello"); -> "H4sIAAAAAAAAAMtIzcnJBwCGphA2BQAAAA=="

In .Net:

compressString("hello"); -> "BQAAAB+LCAAAAAAABADtvQdgHEmWJSYvbcp7f0r1StfgdKEIgGATJNiQQBDswYjN5pLsHWlHIymrKoHKZVZlXWYWQMztnbz33nvvvffee++997o7nU4n99//P1xmZAFs9s5K2smeIYCqyB8/fnwfPyLmeVlW/w+GphA2BQAAAA=="

It is interesting that when I use Decompress method in android to decompress the result of .Net compressString method, it returns the original string correctly but I get error when I decompress the result of android compressedString method.

Android Decompress method:

    public static String Decompress(String zipText) throws IOException {
    int size = 0;
    byte[] gzipBuff = Base64.decode(zipText);

    ByteArrayInputStream memstream = new ByteArrayInputStream(gzipBuff, 4,
            gzipBuff.length - 4);
    GZIPInputStream gzin = new GZIPInputStream(memstream);

    final int buffSize = 8192;
    byte[] tempBuffer = new byte[buffSize];
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    while ((size = gzin.read(tempBuffer, 0, buffSize)) != -1) {
        baos.write(tempBuffer, 0, size);
    }
    byte[] buffer = baos.toByteArray();
    baos.close();

    return new String(buffer, "UTF-8");
}

I think that there is an error in Android compressString method. Can anybody help me?

Bob
  • 22,810
  • 38
  • 143
  • 225
  • Similar issue with ZipOutputStream was resolved by me http://stackoverflow.com/a/11154161/1269737 – StepanM Jun 22 '12 at 10:11

3 Answers3

2

According to this answer, I have 4 methods. Android and .net compress and decompress methods. These methods are compatible with each other except in one case.

Community
  • 1
  • 1
Bob
  • 22,810
  • 38
  • 143
  • 225
2

In the Android version, you should close bos after you close gz.

Also, this line in compressString may give you problems:

byte b[] = str.getBytes();

That will convert the characters to bytes using the default encoding on the device, which is almost certainly not UTF-8. The .NET version, on the other hand, is using UTF8. In Android, try this instead:

byte b[] = str.getBytes("UTF-8");

EDIT: On further looking at your code, I suggest that you rewrite it like this:

byte b[] = str.getBytes("UTF-8");
GZIPOutputStream gz = new GZIPOutputStream(bos);
gz.write(b, 0, b.length);
gz.finish();
gz.close();
bos.close();

The changes are: use UTF-8 to encode characters; use the default internal buffer size for the GZIPOutputStream; call gz.close() before calling bos.close() (the latter probably isn't even needed); and call gz.finish() before calling gz.close().

EDIT 2:

Okay, I should have realized before what's going on. The GZIPOutputStream class is, in my opinion, a stupid design. It has no way to define the compression you want and the default compression is set to none. You need to subclass it and override the default compression. The easiest way is to do this:

GZIPOutputStream gz = new GZIPOutputStream(bos) {
    {
        def.setLevel(Deflater.BEST_COMPRESSION);
    }
};

That will reset the internal deflator that GZIP uses to give the best compression. (By the way, in case you aren't familiar with it, the syntax I'm using here is called an instance initializer block.)

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
0

The main difference is that your .NET code puts the length of the compressed data into the first four byte of the binary data. Your Java codes doesn't do this. It's missing the length field.

When you decompress it, you however expect the length in the first four bytes and start the GZIP decompression at position 4 (skipping the first four bytes).

Codo
  • 75,595
  • 17
  • 168
  • 206
  • .Net has 4 extra characters at the beginning of its Base64 string – Bob Aug 14 '11 at 08:51
  • Yes, that what I'm saying. In addition to the GZIP compressed data, your final data (before Base 64 encoding) has four bytes containing the length of the GZIP compressed data. The Android compressing code doesn't have it yet the Android decompression code expects it. – Codo Aug 14 '11 at 08:54
  • How can I solve it? could you please give me a sample code to solve it? Thanks, – Bob Aug 14 '11 at 09:22
  • The length of b1(Android) is 25 but the length of compressed(.Net) is 123. The deference between the lengths of these two Byte Arrays is too much. – Bob Aug 14 '11 at 09:43
  • Do you need the four bytes (containing the length) at the beginning or not? Based on that I can tell you how to solve it. – Codo Aug 14 '11 at 09:55