lzo codec difference b/w python and java

Question

I am running into a strange problem failing to inflate/uncompress lzo compressed data in java which was deflated/compressed from python lzo module although both seem to be using the same native lzo codec implementation. To give more details, I am using the python module from here:

https://github.com/jd-boyd/python-lzo

and compressing a simple byte "a" yields

import lzo
lzo.compress("a")
> '\xf0\x00\x00\x00\x01\x12a\x11\x00\x00'

and compressing the same byte "a" in java using

https://github.com/twitter/hadoop-lzo

yields

byte[] b = new byte[1];
b[0] = 'a'
ByteArrayInputStream inputByteStream = new ByteArrayInputStream(b);
ByteArrayOutputStream outputByteStream = new ByteArrayOutputStream();
LzoCodec lzoCodec = new LzoCodec();
Configuration conf = new Configuration();
lzoCodec.setConf(conf);
OutputStream outputStream = lzoCodec.createOutputStream(outputByteStream);
int data = inputByteStream.read();
while (data != -1) {
  outputStream.write(data);
  data = inputByteStream.read();
}
StringBuilder sb = new StringBuilder();
for (byte b : outputByteStream.toByteArray()) {
  sb.append(String.format("%02X ", b));
}
System.err.println(sb.toString());
> 00 00 00 01 00 00 00 05 12 61 11 00 00

The trailing part looks similar i.e. the part [ 11 00 00 ] but header definitely looks off. I made sure that both python and java are using lzo version 2.03 and default compression strategy in both python and java is LZO1X_1. Any help will be appreciated.

How did you get the configuration class? ie what is the import for the same? — FirstName LastName, Jul 02 '14 at 22:05

score 0 · Answer 1 · answered May 13 '14 at 08:36

0

Just a guess, but IIRC strings in Python are UTF-8 and in Java they are UTF-16. If I were you I would take a close look at what actually makes it into the string in Java.

answered May 13 '14 at 08:36

nemequ

16,623
1
43
62

lzo codec difference b/w python and java

1 Answers1