2

I want to decompress a string in java which was gzip compressed in python.

Normally, I use base64 encoding on compressed string in python and then decode that compressed string before performing decompression in java. This works fine while using base64 encoding.

But is there a way to decompress a string in java which was gzip compressed in python without using base64 encoding.

Actually, I want to http post the compressed binary data to a server where the binary data gets decompressed. Here compression and http post in done in python and server side is java.

I tried this code without base64 encode in python and read that in java using buffered reader and then converted that read compressed string into byte[] using getBytes() which is given to GZIPInputStream for decompression. But this throws an exception as:

java.io.IOException: Not in GZIP format at 
java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:154)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:75)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:85)
    at GZipFile.gunzipIt(GZipFile.java:58)
    at GZipFile.main(GZipFile.java:42)

Please give me a solution to perform compression and decompression without any encoding. Is there a way to send binary data in http post in python?

This is the compression code in python:

import StringIO  
import gzip  
import base64  
import os  


m='hello'+'\r\n'+'world'  

out = StringIO.StringIO()  
with gzip.GzipFile(fileobj=out, mode="wb") as f:  

    f.write(m)  
f=open('comp_dump','wb')  
f.write(base64.b64encode(out.getvalue()))  
f.close()  

This is the decompression code in java:

//$Id$

import java.io.*;  
import java.io.FileInputStream;  
import java.io.FileOutputStream;  
import java.io.IOException;  
import java.util.zip.GZIPInputStream;  
import javax.xml.bind.DatatypeConverter;  
import java.util.Arrays;

public class GZipFile
{


    public static String readCompressedData()throws Exception
    {
            String compressedStr ="";
            String nextLine;
            BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("comp_dump")));
            try
            {
                    while((nextLine=reader.readLine())!=null)
                    {
                            compressedStr += nextLine;
                    }
            }
            finally
            {
                    reader.close();
            }
            return compressedStr;
    }

    public static void main( String[] args ) throws Exception
    {
            GZipFile gZip = new GZipFile();
            byte[] contentInBytes = DatatypeConverter.parseBase64Binary(readCompressedData());

            String decomp = gZip.gunzipIt(contentInBytes);
            System.out.println(decomp);
    }

    /**
     * GunZip it
     */
    public static String gunzipIt(final byte[] compressed){

            byte[] buffer = new byte[1024];
            StringBuilder decomp = new StringBuilder() ;

            try{

                    GZIPInputStream gzis = new GZIPInputStream(new ByteArrayInputStream(compressed));

                    int len;
                    while ((len = gzis.read(buffer)) > 0) {

                            decomp.append(new String(buffer, 0, len));

                    }

                    gzis.close();

            }catch(IOException ex){
                    ex.printStackTrace();
            }
            return decomp.toString();
    }

}

1 Answers1

0

Not every byte[] can be converted to a string, and the conversion back could give other bytes.

Please define encoding explicitly when compress and do the same when decompress. Otherwise your OS, JVM etc... will do it for you. And probably will mess it up.

For example: on my Linux machine:

Python

import sys
print sys.getdefaultencoding()
>> ascii

Java

System.out.println(Charset.defaultCharset());
>> UTF-8

Related answer: https://stackoverflow.com/a/14467099/3014866

Community
  • 1
  • 1
Rudziankoŭ
  • 10,681
  • 20
  • 92
  • 192