gzip.open() look-forward rolling list when reading file line-by-line

Question

Looking-forward is necessary to check if the current line's data "makes sense" in context, or if it should be omitted. Reading line-by-line is necessary because the files are sometimes 20GB of uncompressed data.

If I were reading entire files, I would've used a indexed for-loop to look ahead. Then I thought it might be easier to read the file in reverse, but I presume that's unfeasible due to the necessity of seeking and the nature of gzip.

So the idea now is to have a rolling-list of the next X number of lines, and keep this filled from the front, as my loop reads lines from the back while having forward-looking access to the next (X - 1) lines. If this is an ideal solution, does there exist any name or optimized recipe for it? Or is there a better solution?

I'm not aware of anything in the standard library like that. `collections.deque` would be a good start for implementing it yourself - you can efficiently pop items from the left, and index into the first few items. Just keep it topped up by appending lines to the right, keeping the length at least X. — jasonharper, Sep 15 '21 at 17:33
Difficult to look-forward (or read in reverse) when decompressing a data stream with variable length "records". @jasonharper's idea of using a `deque` sounds worth pursuing IMO. — martineau, Sep 15 '21 at 19:16

gzip.open() look-forward rolling list when reading file line-by-line

0 Answers0