If you want the index of the last line starting with #
, read once using takewhile
, consuming lines until you hit the first line not starting with #
then seek and use itertools.islice to get the line:
from itertools import takewhile,islice
with open(file) as f:
start = sum(1 for _ in takewhile(lambda x: x[0] == "#",f)) -1
f.seek(0)
data = next(islice(f,start, start+1))
print(data)
The first arg to takewhile is a predicate which while the predicate is True takewhile will take elements from the iterable passed in as the second argument, because a file object returns it's own iterator when we consume the takewhile object using sum the file pointer is now pointing to the very next line after the header line you want so it is just a matter of seeking back and getting the line with islice.
You can obviously also seek much less if you just want to go back to the previous line and take a few lines with islice filtering out until you reach the last line starting with a #
.
file:
###
##
# i am the header
blah
blah
blah
Output:
# i am the header
The only memory efficient way I could think of if the line could be anywhere would mean reading the file once always updating an index variable when you had a line starting with #, then you could pass the to islice as in the answer above or use linecache.getline as in this answer:
import linecache
with open(file) as f:
index = None
for ind, line in enumerate(f, 1):
if line[0] == "#":
index = ind
data = linecache.getline(file, index)
print(data)
We use a starting index of 1
with enumerate as getline
counts starting from 1
.
Or simply update a variable data which will hold each line starting with a #
if you only want that particular line and don't care about the position or the other lines:
with open(file) as f:
data = None
for line in f:
if line[0] == "#":
data = line
print(data) # will be last occurrence of line starting with `#`
Or using file.tell
, keeping tack of the previous pointer location and using that to seek then call next on the file object to get the line/lines we want:
with open(file) as f:
curr_tell, prev_tell = None, None
for line in iter(f.readline, ""):
if line[0] == "#":
curr_tell = prev_tell
prev_tell = f.tell()
f.seek(curr_tell)
data = next(f)
print(data)
# i am the header
There is also the consume recipe from the itertools code that you could use to consume the file iterator up to your header line index -1 then simply call next on the file object:
def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)