We discussed the same problem in the China Python User Group just a few month ago. The some of the answers are copied from our discussion.
No matter what solution you choose, the essential is same: seek to the end of the file, read a block of data, find the last line breaker(\r\n or \n), get the last line, seek backwards, and do the same thing again and again.
You can try to preprocess the file with tail -n
, it is efficient(implemented in C) and is designed for this job.
Check its source code if you want to implement it yourself.
or call the same command in Python:
from subprocess import Popen, PIPE
txt = Popen(['tail', '-n%d' % n, filename], stdout=PIPE).communitate()[0]
;)
or try a pure python solution:
def last_lines(filename, lines = 1):
#print the last several line(s) of a text file
"""
Argument filename is the name of the file to print.
Argument lines is the number of lines to print from last.
"""
block_size = 1024
block = ''
nl_count = 0
start = 0
fsock = file(filename, 'rU')
try:
#seek to end
fsock.seek(0, 2)
#get seek position
curpos = fsock.tell()
while(curpos > 0): #while not BOF
#seek ahead block_size+the length of last read block
curpos -= (block_size + len(block));
if curpos < 0: curpos = 0
fsock.seek(curpos)
#read to end
block = fsock.read()
nl_count = block.count('\n')
#if read enough(more)
if nl_count >= lines: break
#get the exact start position
for n in range(nl_count-lines+1):
start = block.find('\n', start)+1
finally:
fsock.close()
#print it out
print block[start:]
if __name__ == '__main__':
import sys
last_lines(sys.argv[0], 5) #print the last 5 lines of THIS file