Here's a slightly different solution. Instead of multi-line, I focused on just the last line, and instead of a constant block size, I have a dynamic (doubling) block size. See comments for more info.
# Get last line of a text file using seek method. Works with non-constant block size.
# IDK if that speed things up, but it's good enough for us,
# especially with constant line lengths in the file (provided by len_guess),
# in which case the block size doubling is not performed much if at all. Currently,
# we're using this on a textfile format with constant line lengths.
# Requires that the file is opened up in binary mode. No nonzero end-rel seeks in text mode.
REL_FILE_END = 2
def lastTextFileLine(file, len_guess=1):
file.seek(-1, REL_FILE_END) # 1 => go back to position 0; -1 => 1 char back from end of file
text = file.read(1)
tot_sz = 1 # store total size so we know where to seek to next rel file end
if text != b'\n': # if newline is the last character, we want the text right before it
file.seek(0, REL_FILE_END) # else, consider the text all the way at the end (after last newline)
tot_sz = 0
blocks = [] # For storing succesive search blocks, so that we don't end up searching in the already searched
j = file.tell() # j = end pos
not_done = True
block_sz = len_guess
while not_done:
if j < block_sz: # in case our block doubling takes us past the start of the file (here j also = length of file remainder)
block_sz = j
not_done = False
tot_sz += block_sz
file.seek(-tot_sz, REL_FILE_END) # Yes, seek() works with negative numbers for seeking backward from file end
text = file.read(block_sz)
i = text.rfind(b'\n')
if i != -1:
text = text[i+1:].join(reversed(blocks))
return str(text)
else:
blocks.append(text)
block_sz <<= 1 # double block size (converge with open ended binary search-like strategy)
j = j - block_sz # if this doesn't work, try using tmp j1 = file.tell() above
return str(b''.join(reversed(blocks))) # if newline was never found, return everything read
Ideally, you'd wrap this in a class LastTextFileLine and keep track of a moving average of line lengths. This would give you a good len_guess maybe.