2

I have a question - I was introduced to a version of Ziv- Lempel that encodes only repetitions of size 3 or longer(repetitions of 1 or 2 characters aren't encoded - the characters themselves are placed to the coded string and not the (m,k) value). I was asked if it is possible to improve ziv Lempel encoding efficiency (in terms of the length of the coded string - not time complexity).

I think that In terms of length of the coded string - there can be a scenario when not encoding a 3 length repetition at location p and instead encoding a repetition that starts at location p+1 or p+2 may yield better results - this also appears in the theory that I read: I have added a photo of the relevant paragraph that speaks about just that (but no examples are given). Every example I managed to find so far is one that either a code that encodes repetitions of length 3 can also detect.

Here is the paragraph that speaks of the fact that such a text exists:

Our compression algorithm as described so far is greedy: Any repeat of length 3 or more is reported and employed right away. Sometimes this is not optimal: We could have an [m1, k1] repeat in position p, and an [m2, k2] repeat in position p+1 or p+2, with k1 << k2. Thus a non-greedy algorithm may result in improved compression.

(original image)

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
user3167141
  • 39
  • 1
  • 5

2 Answers2

2

Yes. gzip's and zlib's deflate algorithm uses "lazy" matching, which defers the decision to emit a match until the next character in order to see if there is a longer match there. That is definitely a win.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
1

This is technically called LZ non-greedy parsing approach. As Mark mentioned, gzip uses that, but for only p+1 skip.

In this document, you'll find more universal encoder with multiple details of that feature of LZ algorithm

Triostrong
  • 107
  • 8