2

According to https://svn.python.org/projects/external/xz-5.0.3/doc/lzma-file-format.txt

The lzma header should look something like this 1.1. Header

+------------+----+----+----+----+--+--+--+--+--+--+--+--+
| Properties |  Dictionary Size  |   Uncompressed Size   |
+------------+----+----+----+----+--+--+--+--+--+--+--+--+

I tried to generate lzma file of a 16kb *.bin file by using:

1.) the lzma.exe given by 7z standard SDK (with -d23 argument, 2^23 dict size) and then

2.) tried to generate in python using following code

import lzma

fileName = "file_split0_test.bin"
testFileName = "file_split0_test.lzma"
lzma_machine = lzma.LZMACompressor(format=lzma.FORMAT_ALONE)

with open(fileName, "rb") as fileRead:
    toWrite = b""
    byteRead = fileRead.read()

    data_out = lzma_machine.compress(byteRead)

    #print(data_out.hex())
    fs = open(testFileName, 'wb')
    fs.write(data_out)
    fs.close()

fileRead.close()

However, the result of both are different despite I'm using the same "Properties" 5d, and dictionary size 0x8000. I can see that the output of python generated lzma file produced all 0xFF for the "Uncompressed Size" field, unlike the one generated using lzma.exe

Hopefully any expert can point out my mistakes here?

lzma.exe generated file

1

python lzma generated file

2

Sven Eberth
  • 3,057
  • 12
  • 24
  • 29

1 Answers1

3

I was experiencing the same problem as you, and now I can say, that you are probably not doing any mistakes. It looks like modern lzma implementations don't add a value of uncompressed size in the header. They use simple "unknown size", the value of -1, which is sufficient for modern lzma decompressors. However, if you need to have the value of uncompressed size in the header, simply replace those binary data:

uncompressed_size = len(byteRead)
data_out = data_out[:5] + uncompressed_size.to_bytes(8, 'little') + data_out[13:]
Sterver
  • 33
  • 1
  • 7