1

I am running into a strange problem failing to inflate/uncompress lzo compressed data in java which was deflated/compressed from python lzo module although both seem to be using the same native lzo codec implementation. To give more details, I am using the python module from here:

https://github.com/jd-boyd/python-lzo

and compressing a simple byte "a" yields

import lzo
lzo.compress("a")
> '\xf0\x00\x00\x00\x01\x12a\x11\x00\x00'

and compressing the same byte "a" in java using

https://github.com/twitter/hadoop-lzo 

yields

byte[] b = new byte[1];
b[0] = 'a'
ByteArrayInputStream inputByteStream = new ByteArrayInputStream(b);
ByteArrayOutputStream outputByteStream = new ByteArrayOutputStream();
LzoCodec lzoCodec = new LzoCodec();
Configuration conf = new Configuration();
lzoCodec.setConf(conf);
OutputStream outputStream = lzoCodec.createOutputStream(outputByteStream);
int data = inputByteStream.read();
while (data != -1) {
  outputStream.write(data);
  data = inputByteStream.read();
}
StringBuilder sb = new StringBuilder();
for (byte b : outputByteStream.toByteArray()) {
  sb.append(String.format("%02X ", b));
}
System.err.println(sb.toString());
> 00 00 00 01 00 00 00 05 12 61 11 00 00

The trailing part looks similar i.e. the part [ 11 00 00 ] but header definitely looks off. I made sure that both python and java are using lzo version 2.03 and default compression strategy in both python and java is LZO1X_1. Any help will be appreciated.

user352951
  • 271
  • 1
  • 5
  • 11

1 Answers1

0

Just a guess, but IIRC strings in Python are UTF-8 and in Java they are UTF-16. If I were you I would take a close look at what actually makes it into the string in Java.

nemequ
  • 16,623
  • 1
  • 43
  • 62