10

How to implement somethig like the 'head' and 'tail' commands in python and backward read by lines of a text file?

martineau
  • 119,623
  • 25
  • 170
  • 301
user739650
  • 103
  • 1
  • 1
  • 4
  • 1
    possible duplicate of [Read a file in reverse order using python](http://stackoverflow.com/questions/2301789/read-a-file-in-reverse-order-using-python) – Greg Hewgill May 05 '11 at 10:23
  • I need to backward read a big log file – user739650 May 05 '11 at 10:29
  • I'm guessing you're not familiar with [tac](http://www.gnu.org/software/coreutils/manual/html_node/tac-invocation.html) then, because your question would just be "Implement tac in python". – MattH May 05 '11 at 10:33
  • possible duplicate of [Get last n lines of a file with Python, similar to tail](http://stackoverflow.com/questions/136168/get-last-n-lines-of-a-file-with-python-similar-to-tail) – S.Lott May 05 '11 at 10:46

3 Answers3

27

This is my personal file class ;-)

class File(file):
    """ An helper class for file reading  """

    def __init__(self, *args, **kwargs):
        super(File, self).__init__(*args, **kwargs)
        self.BLOCKSIZE = 4096

    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [super(File, self).next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #Go to end of file
        bytes_in_file = self.tell()
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find + 1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(
                self.BLOCKSIZE,
                bytes_in_file - total_bytes_scanned)
            self.seek( -(byte_block + total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(self.BLOCKSIZE).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

    def backward(self):
        self.seek(0, 2)                         #Go to end of file
        blocksize = self.BLOCKSIZE
        last_row = ''
        while self.tell() != 0:
            try:
                self.seek(-blocksize, 1)
            except IOError:
                blocksize = self.tell()
                self.seek(-blocksize, 1)
            block = self.read(blocksize)
            self.seek(-blocksize, 1)
            rows = block.split('\n')
            rows[-1] = rows[-1] + last_row
            while rows:
                last_row = rows.pop(-1)
                if rows and last_row:
                    yield last_row
        yield last_row

Example usage:

with File('file.name') as f:
    print f.head(5)
    print f.tail(5)
    for row in f.backward():
        print row
fdb
  • 1,998
  • 1
  • 19
  • 20
6

head is easy:

from itertools import islice
with open("file") as f:
    for line in islice(f, n):
        print line

tail is harder if you don't want to keep the whole file in memory. If the input is a file, you could start reading blocks beginning at the end of the file. The original tail also works if the input is a pipe, so a more general solution is to read and discard the whole input, except for the last few lines. An easy way to do this is collections.deque:

from collections import deque
with open("file") as f:
    for line in deque(f, maxlen=n):
        print line

In both these code snippets, n is the number of lines to print.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
0

Tail:

def tail(fname, lines):
    """Read last N lines from file fname."""
    f = open(fname, 'r')
    BUFSIZ = 1024
    f.seek(0, os.SEEK_END)
    fsize = f.tell()
    block = -1
    data = ""
    exit = False
    while not exit:
        step = (block * BUFSIZ)
        if abs(step) >= fsize:
            f.seek(0)
            exit = True
        else:
            f.seek(step, os.SEEK_END)
        data = f.read().strip()
        if data.count('\n') >= lines:
            break
        else:
            block -= 1
    return data.splitlines()[-lines:]
Giampaolo Rodolà
  • 12,488
  • 6
  • 68
  • 60