How to design an efficient algorithm to support random access in LZ77 compressed sequences?

Question

I'm working on a project in text compression and I need to design an efficient algorithm in a LZ77 compressed sequences. Specially, Given a LZ77 compressed sequence and an index i, we can recover a single symbol S[i] of the input sequence S. The space consumed by the algorithm and the time of random access to a symbol are what we pursue。

Thanks in advance for your suggestions.

score 0 · Answer 1 · answered Jul 02 '21 at 15:02

0

See zran.c for an example. It builds an index with entry points into a gzip or zlib stream. You can select approximately how close the entry points are to each other, in terms of the uncompressed data. To get to a random byte of uncompressed data, you start decompressing at the closest entry point that precedes that byte, and decompress until you get to that byte.

The tradeoff is less storage for fewer entry points, but a longer time to get to any given byte, vs. more storage and less time.

answered Jul 02 '21 at 15:02

Mark Adler

101,978
13
118
158

Thanks you for your suggestions.Unfortunately, I have adopted this scheme before. Now I want to design an algorithm that does not need extra space and supports random access, or design an efficient random access algorithm under the condition of paying small amount of extra space. Thank you again for your suggestion. – Tina Jul 03 '21 at 09:04
LZ77 compresses using the previously uncompressed data as a source of strings to copy. It is therefore inherent to the compression method that you would have to restore that history somehow when starting decompression in a random location. – Mark Adler Jul 03 '21 at 17:49
The alternative is the discard the history at select random access points. If you discard the history often, you will severely degrade the compression. If you're ok with having to decompress ~1 MB of data to get to your byte at a random location, then the hit on deflate compression would not be too bad to discard the history every 1 MB. – Mark Adler Jul 03 '21 at 17:51
To do that, you would need to be in control of the compression of the data. – Mark Adler Jul 03 '21 at 17:52
1

Thank you for your advice，I really appreciate your help. Unfortunately, I am pursuing a random access scheme that does not affect the compression ratio. – Tina Jul 17 '21 at 02:41

How to design an efficient algorithm to support random access in LZ77 compressed sequences?

1 Answers1