This is a perfect use case for a length limited collections.deque
:
from collections import deque
line_history = deque(maxlen=25)
with open(file) as input:
for line in input:
if "error code" in line:
print(*line_history, line, sep='')
# Clear history so if two errors seen in close proximity, we don't
# echo some lines twice
line_history.clear()
else:
# When deque reaches 25 lines, will automatically evict oldest
line_history.append(line)
Complete explanation of why I chose this approach (skip if you don't really care):
This isn't solvable in a good/safe way using for
/range
, because indexing only makes sense if you load the whole file into memory; the file on disk has no idea where lines begin and end, so you can't just ask for "line #357 of the file" without reading it from the beginning to find lines 1 through 356. You'd either end up repeatedly rereading the file, or slurping the whole file into an in-memory sequence (e.g. list
/tuple
) to have indexing make sense.
For a log file, you have to assume it could be quite large (I regularly deal with multi-gigabyte log files), to the point where loading it into memory would exhaust main memory, so slurping is a bad idea, and rereading the file from scratch each time you hit an error is almost as bad (it's slow, but it's reliably slow I guess?). The deque
based approach means your peak memory usage is based on the 27 longest lines in the file, rather than the total file size.
A naïve solution with nothing but built-ins could be as simple as:
with open(file) as input:
lines = tuple(input) # Slurps all lines from file
for i, line in enumerate(lines):
if "error code" in line:
print(*lines[max(i-25, 0):i], line, sep='')
but like I said, this requires enough memory to hold your entire log file in memory at once, which is a bad thing to count on. It also repeats lines when two errors occur in close proximity, because unlike deque
, you don't get an easy way to empty your recent memory; you'd have to manually track the index of the last print
to restrict your slice.
Note that even then, I didn't use range
; range
is a crutch a lot of people coming from C backgrounds rely on, but it's usually the wrong way to solve a problem in Python. In cases where an index is needed (it usually isn't), you usually need the value too, so enumerate
based solutions are superior; most of the time, you don't need an index at all, so direct iteration (or paired iteration with zip
or the like) is the correct solution.