1

I'm storing compressed data, then retrieving it and decompressing. As first 4 byte of compressed data, I store the size of original data, so I know exactly the size of the buffer I need to hold the decompressed data. Now, when I allocate exact number of bytes for buffer to hold decompression results, and pass that number of bytes to LZ4_decompress_safe, sometimes it would return negative value, for example:

using length_t = std::uint32_t;

length_t orig_size = 0;
std::memcpy(&orig_size, retr_value, sizeof(orig_size));
*value = std::malloc(orig_size);
int lz4_size = LZ4_decompress_safe((const char *)retr_value + sizeof(orig_size), (char*)*value, retr_size - sizeof(orig_size), orig_size);
printf("Retrieved size=%ld Orig size=%d decompress=%d\n", retr_size, orig_size, lz4_size);

retr_value is my compressed buffer retrieved from the storage, and retr_size is length of that value. Its first 4 bytes store length of original uncompressed data. Look at printf call. For some decompression attempts, it would output:

Retrieved size=1093 Orig size=1856 decompress=-338

This post suggested to increase buffer size. I started increasing it, first twice - ratio of such failures dropped. Then 5 times, 10 times - still had few failures. Finally increased 50 times and got no failures. Here is the output for problematic buffer shown above after increasing buffer size 50 times (notice now decompression result matches original uncompressed length):

Retrieved size=1093 Orig size=1856 decompress=1856

I was inspired by this code to decompress the data:

See lines 42-63. Notice that ripple authors simply allocate the buffer of exact size of the original data. What am I doing wrong?

Just in case, here is how I compress:

const int max_dst_size = LZ4_compressBound(value_size);
length_t orig_size = value_size;
char* compressed_data = (char*)std::malloc(max_dst_size + sizeof(orig_size));
std::memcpy(compressed_data, &orig_size, sizeof(orig_size));
const int compressed_data_size = LZ4_compress_HC((const char*)value, compressed_data + sizeof(orig_size), value_size, max_dst_size, 12);

With LZ4_compress_default, results are the same.

DimaA6_ABC
  • 578
  • 4
  • 15
  • 1
    Hold off answering for now, looks like I'm spoiling memory somehow. – DimaA6_ABC Apr 06 '19 at 16:02
  • 1
    Most likely damaged data when storing values. Should have implemented memory management differently. Sorry guys, pls ignore. – DimaA6_ABC Apr 06 '19 at 17:14
  • `LZ4_decompress_safe()` is designed to work with an output buffer sized with exactly the original size of decompressed data. This property is rigorously tested in `test/fuzzer.c`, which is run a few million times before every release. So yes, as you suspect, damaged data is another potential scenario. – Cyan Apr 10 '19 at 10:27
  • 1
    Data which was coming to me was not handled properly - NuDB library returned found value in C++11 lambda, and we did not handle buffer inside the lambda, but instead just stored buffer pointer and processed it after the call to fetch the data. Once I moved lz4 call inside lambda, everything started to work as it should. – DimaA6_ABC Apr 12 '19 at 11:26

0 Answers0