12

I have some strings that are roughly 10K characters each. There is plenty of repetition in them. They are serialized JSON objects. I'd like to easily compress them into a byte array, and uncompress them from a byte array.

How can I most easily do this? I'm looking for methods so I can do the following:

String original = "....long string here with 10K characters...";
byte[] compressed = StringCompressor.compress(original);
String decompressed = StringCompressor.decompress(compressed);
assert(original.equals(decompressed);
Steve McLeod
  • 51,737
  • 47
  • 128
  • 184
  • 1
    I would use InflatorInputStream/DeflatorOutputStream with ByteArrayInput/OutputStream. – Peter Lawrey May 13 '12 at 14:11
  • 2
    There's an easy-to-use 'zip' class out there... edit - it is here http://docs.oracle.com/javase/6/docs/api/java/util/zip/package-summary.html and seems to use the classes @peter mentioned. – Tony Ennis May 13 '12 at 14:11
  • 2
    How about this? http://stackoverflow.com/questions/3649485/how-to-compress-a-string – Mikita Belahlazau May 13 '12 at 14:13
  • just using `String` and `byte[]` this can't be more than a 10-15 line method, assuming the JSON is all ascii. If you have to do something utf-8 ish, add 10 more lines... – CodeClown42 May 13 '12 at 14:13

3 Answers3

28

You can try

enum StringCompressor {
    ;
    public static byte[] compress(String text) {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        try {
            OutputStream out = new DeflaterOutputStream(baos);
            out.write(text.getBytes("UTF-8"));
            out.close();
        } catch (IOException e) {
            throw new AssertionError(e);
        }
        return baos.toByteArray();
    }

    public static String decompress(byte[] bytes) {
        InputStream in = new InflaterInputStream(new ByteArrayInputStream(bytes));
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        try {
            byte[] buffer = new byte[8192];
            int len;
            while((len = in.read(buffer))>0)
                baos.write(buffer, 0, len);
            return new String(baos.toByteArray(), "UTF-8");
        } catch (IOException e) {
            throw new AssertionError(e);
        }
    }
}
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
3

Peter Lawrey's answer can be improved a bit using this less complex code for the decompress function

    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try {
        OutputStream out = new InflaterOutputStream(baos);
        out.write(bytes);
        out.close();
        return new String(baos.toByteArray(), "UTF-8");
    } catch (IOException e) {
        throw new AssertionError(e);
    }
Ray Hulha
  • 10,701
  • 5
  • 53
  • 53
1

I made a library to solve the problem of compressing generic Strings (expecially short ones). It tries to compress the String using various algorithms (plain utf-8, 5bit encoding for latin letters, huffman encoding, gzip for long Strings) and chooses the one with the shortest result (in the worst case, it will choose the utf-8 encoding, so that you never risk to lose space).

I hope it may be useful, here's the link https://github.com/lithedream/lithestring

EDIT: I realized that your Strings are always "long", my library defaults on gzip for those sizes, I fear I cannot do better for you.

lithedream
  • 11
  • 4