0

C language/compression algorithm noob here, apologies in advance.

I am looking into a utf-16 string compression algorithm based on Lempel-Ziv as explained here http://www.unicode.org/notes/tn31/

According to the implementation (https://www.unicode.org/notes/tn31/#Performance), a 1014 byte string should be compressed to about 560 (about 60%).

However I downloaded the sample c (https://www.unicode.org/notes/tn31/utf16_compressor.tar.gz) code and tested compressing a string of 1290 length (I added a print statement to print the input and output lengths) but the output length is 3018 after compression. Is there something I am missing or am I misinterpreting the output length? From the code the output buffer of the compression function is an unsigned char (1 byte) array hence meaning the 3018 is actually 3018 bytes?

Jabberwocky
  • 48,281
  • 17
  • 65
  • 115
user3689913
  • 382
  • 4
  • 10
  • 1
    It's possible for the output to be larger than the input if the input is not very compressible. – Ian Abbott Aug 06 '20 at 14:39
  • Thanks @IanAbbott I tried a string consisting the same character e.g. ttttttttttttttttttttttttt and yes the compressed output was indeed smaller than the input. Thanks for the pointer – user3689913 Aug 06 '20 at 16:09

0 Answers0