210

I'm writing a log file viewer for a web application and for that I want to paginate through the lines of the log file. The items in the file are line based with the newest item at the bottom.

So I need a tail() method that can read n lines from the bottom and support an offset. This is hat I came up with:

def tail(f, n, offset=0):
    """Reads a n lines from f with an offset of offset lines."""
    avg_line_length = 74
    to_read = n + offset
    while 1:
        try:
            f.seek(-(avg_line_length * to_read), 2)
        except IOError:
            # woops.  apparently file is smaller than what we want
            # to step back, go to the beginning instead
            f.seek(0)
        pos = f.tell()
        lines = f.read().splitlines()
        if len(lines) >= to_read or pos == 0:
            return lines[-to_read:offset and -offset or None]
        avg_line_length *= 1.3

Is this a reasonable approach? What is the recommended way to tail log files with offsets?

codeforester
  • 39,467
  • 16
  • 112
  • 140
Armin Ronacher
  • 31,998
  • 13
  • 65
  • 69
  • On my system (linux SLES 10), seeking relative to the end raises an IOError "can't do nonzero end-relative seeks". I like this solution but have modified it to get the file length (`seek(0,2)` then `tell()`), and use that value to seek relative to the beginning. – Anne Feb 07 '12 at 17:19
  • 3
    Congrats - this question made it into the Kippo source code –  Feb 28 '14 at 10:15
  • The parameters of the `open` command used to generate the `f` file object should be specified, because depending if `f=open(..., 'rb')` or `f=open(..., 'rt')` the `f` must be processed differently – Dr Fabio Gori Feb 05 '20 at 09:37
  • I decided to write a 100% generalized solution to this so now you can access a gigantic text file like a list with arbitrary positive or negative slicing ex: [-2000:-1900] and so on https://github.com/SurpriseDog/readlines/blob/main/readlines.py – SurpriseDog Mar 18 '21 at 15:58

36 Answers36

132

This may be quicker than yours. Makes no assumptions about line length. Backs through the file one block at a time till it's found the right number of '\n' characters.

def tail( f, lines=20 ):
    total_lines_wanted = lines

    BLOCK_SIZE = 1024
    f.seek(0, 2)
    block_end_byte = f.tell()
    lines_to_go = total_lines_wanted
    block_number = -1
    blocks = [] # blocks of size BLOCK_SIZE, in reverse order starting
                # from the end of the file
    while lines_to_go > 0 and block_end_byte > 0:
        if (block_end_byte - BLOCK_SIZE > 0):
            # read the last block we haven't yet read
            f.seek(block_number*BLOCK_SIZE, 2)
            blocks.append(f.read(BLOCK_SIZE))
        else:
            # file too small, start from begining
            f.seek(0,0)
            # only read what was not read
            blocks.append(f.read(block_end_byte))
        lines_found = blocks[-1].count('\n')
        lines_to_go -= lines_found
        block_end_byte -= BLOCK_SIZE
        block_number -= 1
    all_read_text = ''.join(reversed(blocks))
    return '\n'.join(all_read_text.splitlines()[-total_lines_wanted:])

I don't like tricky assumptions about line length when -- as a practical matter -- you can never know things like that.

Generally, this will locate the last 20 lines on the first or second pass through the loop. If your 74 character thing is actually accurate, you make the block size 2048 and you'll tail 20 lines almost immediately.

Also, I don't burn a lot of brain calories trying to finesse alignment with physical OS blocks. Using these high-level I/O packages, I doubt you'll see any performance consequence of trying to align on OS block boundaries. If you use lower-level I/O, then you might see a speedup.


UPDATE

for Python 3.2 and up, follow the process on bytes as In text files (those opened without a "b" in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)).:

eg: f = open('C:/.../../apache_logs.txt', 'rb')

 def tail(f, lines=20):
    total_lines_wanted = lines

    BLOCK_SIZE = 1024
    f.seek(0, 2)
    block_end_byte = f.tell()
    lines_to_go = total_lines_wanted
    block_number = -1
    blocks = []
    while lines_to_go > 0 and block_end_byte > 0:
        if (block_end_byte - BLOCK_SIZE > 0):
            f.seek(block_number*BLOCK_SIZE, 2)
            blocks.append(f.read(BLOCK_SIZE))
        else:
            f.seek(0,0)
            blocks.append(f.read(block_end_byte))
        lines_found = blocks[-1].count(b'\n')
        lines_to_go -= lines_found
        block_end_byte -= BLOCK_SIZE
        block_number -= 1
    all_read_text = b''.join(reversed(blocks))
    return b'\n'.join(all_read_text.splitlines()[-total_lines_wanted:])
DARK_C0D3R
  • 2,075
  • 16
  • 21
S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • Nice. At least one poster that read the question and the code in there :) – Armin Ronacher Sep 25 '08 at 21:55
  • This works really well. I just pulled it into a script to read the last 4000 lines of a log file before I parse them. It works quickly and makes sense. Thanks! – Jeff Hellman Jul 16 '09 at 05:26
  • 15
    This fails on small logfiles -- IOError: invalid argument -- f.seek( block*1024, 2 ) – ohnoes Dec 04 '09 at 11:19
  • I re-edited this solution a while back, it is yet to be peer reviewed (again). The `data` lines need to be reversed before the join since the file is read backwards. – Pykler Oct 04 '11 at 18:21
  • A corner case: last line is larger than `BUFSIZ` and `window == 1`. Perhaps `linesFound` should be something like `data[-1].count('\n') - 1`? – sje397 Oct 13 '11 at 01:52
  • 1
    Very nice approach indeed. I used a slightly modified version of the code above and came up with this recipe: http://code.activestate.com/recipes/577968-log-watcher-tail-f-log/ – Giampaolo Rodolà Nov 29 '11 at 19:32
  • 6
    No longer works in python 3.2. I'm getting `io.UnsupportedOperation: can't do nonzero end-relative seeks` I can change the offset to 0, but that defeats the purpose of the function. – Logical Fallacy May 01 '12 at 21:27
  • 4
    @DavidEnglund Reason is [here](http://www.velocityreviews.com/forums/t748976-python-3-2-bug-reading-the-last-line-of-a-file.html). In brief: seeking relative to the end of the file is not allowed in text mode, presumably because the file contents have to be decoded, and, in general, seeking to an arbitrary position within a sequence of encoded bytes can have undefined results when you attempt to decode to Unicode starting from that position. The suggestion offered at the link is to try opening the file in binary mode and do the decoding yourself, catching the DecodeError exceptions. – max Sep 10 '12 at 19:23
  • 7
    DON'T USE THIS CODE. It corrupts lines in some border cases in python 2.7. The answer from @papercrane below fixes it. – xApple Apr 29 '13 at 10:19
  • I get `invalid literal for int() with base 12: "07b925503',"` for some of my lines. @papercrane's answer works well. – AliBZ Jan 23 '14 at 18:10
  • I just made a fairly drastic change. Partly, it's a style change, but more importantly, it fixes a bug where the output from this function was consistently wrong if the desired number of lines didn't fit in one 1024-byte block. Feel free to rollback the style changes if you don't approve, but you'll want to preserve the bugfix. – Mark Amery Aug 23 '14 at 20:07
  • Just thought I'd leave a quick note with benchmarking information. Seems this is even quicker than shelling out to tail - thanks! For details see: https://gitweb.torproject.org/stem.git/commit/?id=8736a7e – Damian Mar 24 '15 at 16:39
  • 1
    Side-note: You don't need to "burn a lot of brain calories trying to finesse alignment with physical OS blocks"; it's not hard to get the ideal block size on many OSes, and Python provides a reasonable default for systems (e.g. Windows) that don't. Just change the definition of `BLOCK_SIZE` to `BLOCK_SIZE = getattr(os.fstat(f.fileno()), 'st_blksize', io.DEFAULT_BUFFER_SIZE)`, and it'll use the file's self-reported preferred I/O blocksize, or Python's fallback default (which is currently 8 KB, still small enough that you wouldn't expect serious slowdowns if the lines are short). – ShadowRanger Oct 03 '18 at 17:40
100

Assumes a unix-like system on Python 2 you can do:

import os
def tail(f, n, offset=0):
  stdin,stdout = os.popen2("tail -n "+n+offset+" "+f)
  stdin.close()
  lines = stdout.readlines(); stdout.close()
  return lines[:,-offset]

For python 3 you may do:

import subprocess
def tail(f, n, offset=0):
    proc = subprocess.Popen(['tail', '-n', n + offset, f], stdout=subprocess.PIPE)
    lines = proc.stdout.readlines()
    return lines[:, -offset]
Ivan Castellanos
  • 8,041
  • 1
  • 47
  • 42
Mark
  • 106,305
  • 20
  • 172
  • 230
  • 9
    Should be platform independent. Besides, if you read the question you will see that f is a file like object. – Armin Ronacher Sep 25 '08 at 21:57
  • 52
    the question doesn't say platform dependence is unacceptable. i fail to see why this deserves two downvotes when it provides a very unixy (may be what you're looking for... certainly was for me) way of doing exactly what the question asks. – Shabbyrobe Jun 03 '09 at 04:27
  • 3
    Thanks, I was thinking I had to solve this in pure Python but there's no reason not to use UNIX utilities when they are at hand, so I went with this. FWIW in modern Python, subprocess.check_output is likely preferable to os.popen2; it simplifies things a bit as it just returns the output as a string, and raises on a non-zero exit code. – mrooney Oct 30 '13 at 01:35
  • 3
    Although this is platform dependent, it is a *very* efficient way of doing what has been asked, as well as being an extremely fast way of doing it (You don't have to load the entire file into memory). @Shabbyrobe – earthmeLon Nov 21 '14 at 22:53
  • 3
    @Mark an update might be nice since popen2 is deprecated since pyton2.6 – ezdazuzena Mar 08 '16 at 12:00
  • 7
    You might want to precalculate the offset like :`offset_total = str(n+offset)` and replace this line `stdin,stdout = os.popen2("tail -n "+offset_total+" "+f)` to avoid `TypeErrors (cannot concatenate int+str)` – AddingColor Oct 08 '16 at 15:47
  • @FranciscoCouzo, this isn't really an issue. I don't see any user input, sql type injection problems. Anyone that could execute a python script would already have access to a command line where they could run that command directly. – Mark Oct 20 '16 at 20:38
  • 1
    @Mark sure, but anyone reading the answer could be running it on user input. – Francisco Oct 20 '16 at 20:43
  • 2
    python3: BEFORE --> `proc = subprocess.Popen(['tail', '-n', n + offset, f], stdout=subprocess.PIPE)` AFTER --> `proc = subprocess.Popen(['tail', '-n', '\"%s\"' % (n + offset), file_path], stdout=subprocess.PIPE)` to prevent Error – gies0r Jul 11 '19 at 20:10
  • 1
    I get: expected str, bytes or os.PathLike object, not int with `proc = subprocess.Popen(['tail', '-n', n + offset, f], stdout=subprocess.PIPE)` – Timo Jan 02 '21 at 10:55
  • Can you show how you call it. Still get the error from 2nd jan. I call it with fun('file.txt',10) – Timo Jan 28 '21 at 19:14
  • 1
    you need to cast n to str to prevent error from 2nd jan (see f'{n}'): `proc = subprocess.Popen(['tail', '-n', f'{n}', f], stdout=subprocess.PIPE)` – v0devil Aug 23 '21 at 13:51
  • +100. As for comments that this is for Linux, remind folks that Apple, Google (Chrome OS) and now even Microsoft are supporting Linux (WSL). Top 500 super computers also run Linux I heard. – WinEunuuchs2Unix Jun 26 '23 at 03:17
46

Here is my answer. Pure python. Using timeit it seems pretty fast. Tailing 100 lines of a log file that has 100,000 lines:

>>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=10)
0.0014600753784179688
>>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=100)
0.00899195671081543
>>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=1000)
0.05842900276184082
>>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=10000)
0.5394978523254395
>>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=100000)
5.377126932144165

Here is the code:

import os


def tail(f, lines=1, _buffer=4098):
    """Tail a file and get X lines from the end"""
    # place holder for the lines found
    lines_found = []

    # block counter will be multiplied by buffer
    # to get the block size from the end
    block_counter = -1

    # loop until we find X lines
    while len(lines_found) < lines:
        try:
            f.seek(block_counter * _buffer, os.SEEK_END)
        except IOError:  # either file is too small, or too many lines requested
            f.seek(0)
            lines_found = f.readlines()
            break

        lines_found = f.readlines()

        # we found enough lines, get out
        # Removed this line because it was redundant the while will catch
        # it, I left it for history
        # if len(lines_found) > lines:
        #    break

        # decrement the block counter to get the
        # next X bytes
        block_counter -= 1

    return lines_found[-lines:]
glenbot
  • 685
  • 6
  • 7
  • 3
    Elegant solution! Is the `if len(lines_found) > lines:` really necessary? Wouldn't the `loop` condition catch it as well? – Maximilian Peters Jul 23 '16 at 08:45
  • A question for my understanding: is `os.SEEK_END` used simply for clarity? As far as I have found, its value is constant (= 2). I was wondering about leaving it out to be able to leave out the `import os`. Thanks for the great solution! – n1k31t4 Oct 05 '17 at 10:51
  • 2
    @MaximilianPeters yes. It's not necessary. I commented it out. – glenbot Oct 06 '17 at 14:21
  • @DexterMorgan you can replace `os.SEEK_END` with its integer equivalent. It was mainly there for readability. – glenbot Oct 06 '17 at 14:22
  • 1
    I upvoted, but have a small nit. After the seek, the first line read may be incomplete, so to get N _complete_lines I changed the `while len(lines_found) < lines` to `while len(lines_found) <= lines` in my copy. Thanks! – Graham Klyne Aug 29 '18 at 11:11
  • 1
    Always seeking from the end is an error because it assumes that the end is the same for each loop iteration. Think log file that gets written to while this code running. – BlackJack Mar 03 '21 at 13:02
36

If reading the whole file is acceptable then use a deque.

from collections import deque
deque(f, maxlen=n)

Prior to 2.6, deques didn't have a maxlen option, but it's easy enough to implement.

import itertools
def maxque(items, size):
    items = iter(items)
    q = deque(itertools.islice(items, size))
    for item in items:
        del q[0]
        q.append(item)
    return q

If it's a requirement to read the file from the end, then use a gallop (a.k.a exponential) search.

def tail(f, n):
    assert n >= 0
    pos, lines = n+1, []
    while len(lines) <= n:
        try:
            f.seek(-pos, 2)
        except IOError:
            f.seek(0)
            break
        finally:
            lines = list(f)
        pos *= 2
    return lines[-n:]
A. Coady
  • 54,452
  • 8
  • 34
  • 40
  • Why does that bottom function work? `pos *= 2` seems completely arbitrary. What is its significance? – 2mac Dec 29 '14 at 19:06
  • 3
    @2mac [Exponential Search](https://en.wikipedia.org/wiki/Exponential_search). It reads from the end of file iteratively, doubling the amount read each time, until enough lines are found. – A. Coady Mar 16 '15 at 23:08
  • I think that the solution to read from the end will not support files encoded with UTF-8, since the character length is variable, and you could (likely will) land at some odd offset that cannot be interpreted correctly. – Mike Feb 17 '19 at 03:53
  • 1
    unfortunately your _galloping_ search solution doesn't work for python 3. As f.seek() doesn't take negative offset. I have updated your code make it work for python 3 [link](https://stackoverflow.com/a/57277212/9485283) – itsjwala Jul 30 '19 at 17:38
  • Here is from the docs what deque does: Once a bounded length deque is full, when new items are added, a corresponding number of items are discarded from the opposite end. If n=1, it reads the last (or only line) from a file. Why do you offer the tail method when deque does the same? – Timo Jan 02 '21 at 10:04
  • A one-liner for the last line with python 3: `with open('example', 'r') as f:d = deque(f, maxlen=1)` – Timo Jan 02 '21 at 10:23
  • @Timo: The `tail` function is, as the OP states, for when you *need* to read the file from the end. If you need to read the last line of a 50 GB file, sure, `deque(f, maxlen=1)` will work, eventually. On a spinning disk drive reading at 100 MB/sec with no contention and no fragmentation, it'll only take eight and a half minutes to finish (it can't magically skip all the other lines, it has to read them and discard them as it goes). Or you use the backwards galloping search and do it in . – ShadowRanger Jan 06 '21 at 20:17
28

S.Lott's answer above almost works for me but ends up giving me partial lines. It turns out that it corrupts data on block boundaries because data holds the read blocks in reversed order. When ''.join(data) is called, the blocks are in the wrong order. This fixes that.

def tail(f, window=20):
    """
    Returns the last `window` lines of file `f` as a list.
    f - a byte file-like object
    """
    if window == 0:
        return []
    BUFSIZ = 1024
    f.seek(0, 2)
    bytes = f.tell()
    size = window + 1
    block = -1
    data = []
    while size > 0 and bytes > 0:
        if bytes - BUFSIZ > 0:
            # Seek back one whole BUFSIZ
            f.seek(block * BUFSIZ, 2)
            # read BUFFER
            data.insert(0, f.read(BUFSIZ))
        else:
            # file too small, start from begining
            f.seek(0,0)
            # only read what was not read
            data.insert(0, f.read(bytes))
        linesFound = data[0].count('\n')
        size -= linesFound
        bytes -= BUFSIZ
        block -= 1
    return ''.join(data).splitlines()[-window:]
borgr
  • 20,175
  • 6
  • 25
  • 35
papercrane
  • 672
  • 5
  • 12
24

The code I ended up using. I think this is the best so far:

def tail(f, n, offset=None):
    """Reads a n lines from f with an offset of offset lines.  The return
    value is a tuple in the form ``(lines, has_more)`` where `has_more` is
    an indicator that is `True` if there are more lines in the file.
    """
    avg_line_length = 74
    to_read = n + (offset or 0)

    while 1:
        try:
            f.seek(-(avg_line_length * to_read), 2)
        except IOError:
            # woops.  apparently file is smaller than what we want
            # to step back, go to the beginning instead
            f.seek(0)
        pos = f.tell()
        lines = f.read().splitlines()
        if len(lines) >= to_read or pos == 0:
            return lines[-to_read:offset and -offset or None], \
                   len(lines) > to_read or pos > 0
        avg_line_length *= 1.3
Armin Ronacher
  • 31,998
  • 13
  • 65
  • 69
14

Simple and fast solution with mmap:

import mmap
import os

def tail(filename, n):
    """Returns last n lines from the filename. No exception handling"""
    size = os.path.getsize(filename)
    with open(filename, "rb") as f:
        # for Windows the mmap parameters are different
        fm = mmap.mmap(f.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
        try:
            for i in xrange(size - 1, -1, -1):
                if fm[i] == '\n':
                    n -= 1
                    if n == -1:
                        break
            return fm[i + 1 if i else 0:].splitlines()
        finally:
            fm.close()
dimitri
  • 874
  • 8
  • 11
  • 1
    This is probably the fastest answer when the input could be huge (or it would be, if it used the `.rfind` method to scan backwards for newlines, rather than performing byte at a time checks at the Python level; in CPython, replacing Python level code with C built-in calls usually wins by a lot). For smaller inputs, the `deque` with a `maxlen` is simpler and probably similarly fast. – ShadowRanger Nov 19 '15 at 18:41
6

Update @papercrane solution to python3. Open the file with open(filename, 'rb') and:

def tail(f, window=20):
    """Returns the last `window` lines of file `f` as a list.
    """
    if window == 0:
        return []

    BUFSIZ = 1024
    f.seek(0, 2)
    remaining_bytes = f.tell()
    size = window + 1
    block = -1
    data = []

    while size > 0 and remaining_bytes > 0:
        if remaining_bytes - BUFSIZ > 0:
            # Seek back one whole BUFSIZ
            f.seek(block * BUFSIZ, 2)
            # read BUFFER
            bunch = f.read(BUFSIZ)
        else:
            # file too small, start from beginning
            f.seek(0, 0)
            # only read what was not read
            bunch = f.read(remaining_bytes)

        bunch = bunch.decode('utf-8')
        data.insert(0, bunch)
        size -= bunch.count('\n')
        remaining_bytes -= BUFSIZ
        block -= 1

    return ''.join(data).splitlines()[-window:]
Emilio
  • 71
  • 2
  • 2
  • You might want to add: `assert "b" in file.mode, "File mode must be bytes!"` to check if the file mode is actually bytes. – JulianWgs Aug 04 '21 at 22:06
6

The simplest way is to use deque:

from collections import deque

def tail(filename, n=10):
    with open(filename) as f:
        return deque(f, n)
Zhen Wang
  • 61
  • 1
  • 5
  • 4
    This will iterate through the whole file. Keep this in mind if you are working with large files. – Austin Aug 21 '20 at 17:32
5

Posting an answer at the behest of commenters on my answer to a similar question where the same technique was used to mutate the last line of a file, not just get it.

For a file of significant size, mmap is the best way to do this. To improve on the existing mmap answer, this version is portable between Windows and Linux, and should run faster (though it won't work without some modifications on 32 bit Python with files in the GB range, see the other answer for hints on handling this, and for modifying to work on Python 2).

import io  # Gets consistent version of open for both Py2.7 and Py3.x
import itertools
import mmap

def skip_back_lines(mm, numlines, startidx):
    '''Factored out to simplify handling of n and offset'''
    for _ in itertools.repeat(None, numlines):
        startidx = mm.rfind(b'\n', 0, startidx)
        if startidx < 0:
            break
    return startidx

def tail(f, n, offset=0):
    # Reopen file in binary mode
    with io.open(f.name, 'rb') as binf, mmap.mmap(binf.fileno(), 0, access=mmap.ACCESS_READ) as mm:
        # len(mm) - 1 handles files ending w/newline by getting the prior line
        startofline = skip_back_lines(mm, offset, len(mm) - 1)
        if startofline < 0:
            return []  # Offset lines consumed whole file, nothing to return
            # If using a generator function (yield-ing, see below),
            # this should be a plain return, no empty list

        endoflines = startofline + 1  # Slice end to omit offset lines

        # Find start of lines to capture (add 1 to move from newline to beginning of following line)
        startofline = skip_back_lines(mm, n, startofline) + 1

        # Passing True to splitlines makes it return the list of lines without
        # removing the trailing newline (if any), so list mimics f.readlines()
        return mm[startofline:endoflines].splitlines(True)
        # If Windows style \r\n newlines need to be normalized to \n, and input
        # is ASCII compatible, can normalize newlines with:
        # return mm[startofline:endoflines].replace(os.linesep.encode('ascii'), b'\n').splitlines(True)

This assumes the number of lines tailed is small enough you can safely read them all into memory at once; you could also make this a generator function and manually read a line at a time by replacing the final line with:

        mm.seek(startofline)
        # Call mm.readline n times, or until EOF, whichever comes first
        # Python 3.2 and earlier:
        for line in itertools.islice(iter(mm.readline, b''), n):
            yield line

        # 3.3+:
        yield from itertools.islice(iter(mm.readline, b''), n)

Lastly, this read in binary mode (necessary to use mmap) so it gives str lines (Py2) and bytes lines (Py3); if you want unicode (Py2) or str (Py3), the iterative approach could be tweaked to decode for you and/or fix newlines:

        lines = itertools.islice(iter(mm.readline, b''), n)
        if f.encoding:  # Decode if the passed file was opened with a specific encoding
            lines = (line.decode(f.encoding) for line in lines)
        if 'b' not in f.mode:  # Fix line breaks if passed file opened in text mode
            lines = (line.replace(os.linesep, '\n') for line in lines)
        # Python 3.2 and earlier:
        for line in lines:
            yield line
        # 3.3+:
        yield from lines

Note: I typed this all up on a machine where I lack access to Python to test. Please let me know if I typoed anything; this was similar enough to my other answer that I think it should work, but the tweaks (e.g. handling an offset) could lead to subtle errors. Please let me know in the comments if there are any mistakes.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
4

An even cleaner python3 compatible version that doesn't insert but appends & reverses:

def tail(f, window=1):
    """
    Returns the last `window` lines of file `f` as a list of bytes.
    """
    if window == 0:
        return b''
    BUFSIZE = 1024
    f.seek(0, 2)
    end = f.tell()
    nlines = window + 1
    data = []
    while nlines > 0 and end > 0:
        i = max(0, end - BUFSIZE)
        nread = min(end, BUFSIZE)

        f.seek(i)
        chunk = f.read(nread)
        data.append(chunk)
        nlines -= chunk.count(b'\n')
        end -= nread
    return b'\n'.join(b''.join(reversed(data)).splitlines()[-window:])

use it like this:

with open(path, 'rb') as f:
    last_lines = tail(f, 3).decode('utf-8')
hrehfeld
  • 470
  • 1
  • 4
  • 14
  • Not too shabby – but I would in general advise not to add an answer to a 10-year old question with plenty of answers. But help me out: what is specific to Python 3 in your code? – Jongware Jan 04 '18 at 01:58
  • The other answers were not exactly working out well :-) py3: see https://stackoverflow.com/questions/136168/get-last-n-lines-of-a-file-with-python-similar-to-tail/48087596#comment16595577_136368 – hrehfeld Jan 04 '18 at 21:38
3

Simple :

with open("test.txt") as f:
data = f.readlines()
tail = data[-2:]
print(''.join(tail)
2

I found the Popen above to be the best solution. It's quick and dirty and it works For python 2.6 on Unix machine i used the following

def GetLastNLines(self, n, fileName):
    """
    Name:           Get LastNLines
    Description:        Gets last n lines using Unix tail
    Output:         returns last n lines of a file
    Keyword argument:
    n -- number of last lines to return
    filename -- Name of the file you need to tail into
    """
    p = subprocess.Popen(['tail','-n',str(n),self.__fileName], stdout=subprocess.PIPE)
    soutput, sinput = p.communicate()
    return soutput

soutput will have will contain last n lines of the code. to iterate through soutput line by line do:

for line in GetLastNLines(50,'myfile.log').split('\n'):
    print line
Dr Fabio Gori
  • 1,105
  • 16
  • 21
Marko
  • 37
  • 2
2

There are some existing implementations of tail on pypi which you can install using pip:

  • mtFileUtil
  • multitail
  • log4tailer
  • ...

Depending on your situation, there may be advantages to using one of these existing tools.

Travis Bear
  • 13,039
  • 7
  • 42
  • 51
  • Are you aware of any module that works on Windows? I tried `tailhead`, `tailer` but they didn't work. Also tried `mtFileUtil`. It was initially throwing error because `print` statements were not having parenthesis (I am on Python 3.6). I added those in `reverse.py` and the error messages were gone but when my script calls the module (`mtFileUtil.tail(open(logfile_path), 5)`), it doesn't print anything. – Technext Sep 19 '18 at 11:58
2

based on S.Lott's top voted answer (Sep 25 '08 at 21:43), but fixed for small files.

def tail(the_file, lines_2find=20):  
    the_file.seek(0, 2)                         #go to end of file
    bytes_in_file = the_file.tell()             
    lines_found, total_bytes_scanned = 0, 0
    while lines_2find+1 > lines_found and bytes_in_file > total_bytes_scanned: 
        byte_block = min(1024, bytes_in_file-total_bytes_scanned)
        the_file.seek(-(byte_block+total_bytes_scanned), 2)
        total_bytes_scanned += byte_block
        lines_found += the_file.read(1024).count('\n')
    the_file.seek(-total_bytes_scanned, 2)
    line_list = list(the_file.readlines())
    return line_list[-lines_2find:]

    #we read at least 21 line breaks from the bottom, block by block for speed
    #21 to ensure we don't get a half line

Hope this is useful.

Eyecue
  • 21
  • 2
1

For efficiency with very large files (common in logfile situations where you may want to use tail), you generally want to avoid reading the whole file (even if you do do it without reading the whole file into memory at once) However, you do need to somehow work out the offset in lines rather than characters. One possibility is reading backwards with seek() char by char, but this is very slow. Instead, its better to process in larger blocks.

I've a utility function I wrote a while ago to read files backwards that can be used here.

import os, itertools

def rblocks(f, blocksize=4096):
    """Read file as series of blocks from end of file to start.

    The data itself is in normal order, only the order of the blocks is reversed.
    ie. "hello world" -> ["ld","wor", "lo ", "hel"]
    Note that the file must be opened in binary mode.
    """
    if 'b' not in f.mode.lower():
        raise Exception("File must be opened using binary mode.")
    size = os.stat(f.name).st_size
    fullblocks, lastblock = divmod(size, blocksize)

    # The first(end of file) block will be short, since this leaves 
    # the rest aligned on a blocksize boundary.  This may be more 
    # efficient than having the last (first in file) block be short
    f.seek(-lastblock,2)
    yield f.read(lastblock)

    for i in range(fullblocks-1,-1, -1):
        f.seek(i * blocksize)
        yield f.read(blocksize)

def tail(f, nlines):
    buf = ''
    result = []
    for block in rblocks(f):
        buf = block + buf
        lines = buf.splitlines()

        # Return all lines except the first (since may be partial)
        if lines:
            result.extend(lines[1:]) # First line may not be complete
            if(len(result) >= nlines):
                return result[-nlines:]

            buf = lines[0]

    return ([buf]+result)[-nlines:]


f=open('file_to_tail.txt','rb')
for line in tail(f, 20):
    print line

[Edit] Added more specific version (avoids need to reverse twice)

Brian
  • 116,865
  • 28
  • 107
  • 112
  • A quick tests shows that this performs a lot worse than my version from above. Probably because of your buffering. – Armin Ronacher Sep 25 '08 at 22:00
  • I suspect it's because I'm doing multiple seeks backwards, so aren't getting as good use of the readahead buffer. However, I think it may do better when your guess at the line length isn't accurate (eg. very large lines), as it avoids having to re-read data in this case. – Brian Sep 25 '08 at 22:23
1

Here is a pretty simple implementation:

with open('/etc/passwd', 'r') as f:
  try:
    f.seek(0,2)
    s = ''
    while s.count('\n') < 11:
      cur = f.tell()
      f.seek((cur - 10))
      s = f.read(10) + s
      f.seek((cur - 10))
    print s
  except Exception as e:
    f.readlines()
GL2014
  • 6,016
  • 4
  • 15
  • 22
  • Great example! Could you please explain the use of try before the `f.seek`? Why not before the `with open`? Also, why in the `except` you do a `f.readlines()`?? –  Jul 07 '17 at 15:33
  • Honestly, the try should probably go first.. I don't remember having a reason for not catching the open() other than on a healthy standard Linux system, /etc/passwd should always be readable. try, then with is the more common order. – GL2014 Jul 13 '17 at 20:11
1

you can go to the end of your file with f.seek(0, 2) and then read off lines one by one with the following replacement for readline():

def readline_backwards(self, f):
    backline = ''
    last = ''
    while not last == '\n':
        backline = last + backline
        if f.tell() <= 0:
            return backline
        f.seek(-1, 1)
        last = f.read(1)
        f.seek(-1, 1)
    backline = last
    last = ''
    while not last == '\n':
        backline = last + backline
        if f.tell() <= 0:
            return backline
        f.seek(-1, 1)
        last = f.read(1)
        f.seek(-1, 1)
    f.seek(1, 1)
    return backline
rabbit
  • 11
  • 2
1

Based on Eyecue answer (Jun 10 '10 at 21:28): this class add head() and tail() method to file object.

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

Usage:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)
fdb
  • 1,998
  • 1
  • 19
  • 20
1

There is very useful module that can do this:

from file_read_backwards import FileReadBackwards

with FileReadBackwards("/tmp/file", encoding="utf-8") as frb:

# getting lines by lines starting from the last line up
for l in frb:
    print(l)
Quinten C
  • 660
  • 1
  • 8
  • 18
1

Several of these solutions have issues if the file doesn't end in \n or in ensuring the complete first line is read.

def tail(file, n=1, bs=1024):
    f = open(file)
    f.seek(-1,2)
    l = 1-f.read(1).count('\n') # If file doesn't end in \n, count it anyway.
    B = f.tell()
    while n >= l and B > 0:
            block = min(bs, B)
            B -= block
            f.seek(B, 0)
            l += f.read(block).count('\n')
    f.seek(B, 0)
    l = min(l,n) # discard first (incomplete) line if l > n
    lines = f.readlines()[-l:]
    f.close()
    return lines
1

Update for answer given by A.Coady

Works with python 3.

This uses Exponential Search and will buffer only N lines from back and is very efficient.

import time
import os
import sys

def tail(f, n):
    assert n >= 0
    pos, lines = n+1, []

    # set file pointer to end

    f.seek(0, os.SEEK_END)

    isFileSmall = False

    while len(lines) <= n:
        try:
            f.seek(f.tell() - pos, os.SEEK_SET)
        except ValueError as e:
            # lines greater than file seeking size
            # seek to start
            f.seek(0,os.SEEK_SET)
            isFileSmall = True
        except IOError:
            print("Some problem reading/seeking the file")
            sys.exit(-1)
        finally:
            lines = f.readlines()
            if isFileSmall:
                break

        pos *= 2

    print(lines)

    return lines[-n:]




with open("stream_logs.txt") as f:
    while(True):
        time.sleep(0.5)
        print(tail(f,2))

itsjwala
  • 106
  • 7
1

Deques (2.6+)

If you know that the file is going to be small, a simple deque will do just fine.

from collections import deque

def tail(f, n):
    return deque(f, n)

Quote from docs.python.org:

If maxlen is not specified or is None, deques may grow to an arbitrary length. Otherwise, the deque is bounded to the specified maximum length. Once a bounded length deque is full, when new items are added, a corresponding number of items are discarded from the opposite end. Bounded length deques provide functionality similar to the tail filter in Unix. They are also useful for tracking transactions and other pools of data where only the most recent activity is of interest.

Galloping search (2.7+)

When the size of the file is unspecified, consider sarching the file from the end.

Galloping, or exponential, search minimize the number of read calls by multiplying the number of bytes to search by two each iteration.

This snippet handles edge cases well, apart from multi byte delimiters and files opened in text mode (see "Edge cases" for an example that can handle those) and stores the segments in a memory-efficient deque until joining them just before returning them as a single bytes, taking care to only read data once.

from collections import deque
from os import SEEK_CUR, SEEK_END

def tail(f, n, d = b'\n'):
    u"Read `n` segments (lines) from the end of file `f`, separated by `d`."
    a = deque()
    o = 1
    try:
        # Seek to end of file, exclude first byte from check for newline.
        f.seek(-1, SEEK_END)
        s = f.read(1)
        c = 0
        # Read more segments until enough newline characters has been read.
        while c < n:
            n -= c                     # Subtract newline count from remaining.
            a.appendleft(s)            # Insert segment at the beginning.
            f.seek(-o * 3, SEEK_CUR)   # Seek past the read bytes, plus 2x that.
            o *= 2                     # Multiply step- and readsize by two.
            s = f.read(o)              # Read new segment from file.
            c = s.count(d)             # Count the number of newline characters.
    except OSError:
        # Reached beginning of file, read start of file > start of last segment.
        p = max(0, f.tell() - o)
        f.seek(0)
        s = f.read(p)
        c = s.count(d)
    if c >= n:
        # Strip data, up to the start of the first line, from the last segment.
        i = s.rfind(d)
        while i != -1 and n > 1:
            i = s.rfind(d, None, i)
            n -= 1
        s = s[i+1:]
    a.appendleft(s)
    return b"".join(a)

Usage:

f.write(b'Third\nSecond\nLast'); f.seek(0)
assert readlast(f, 2, b'\n') == b"Second\nLast\n"
f.write(b'\n\n'); f.seek(0)
assert readlast(f, 1, b'\n') == b"\n"
f.write(b'X\n'); f.seek(0)
assert readlast(f, 1, b'\n') == b"X\n"
f.write(b''); f.seek(0)
assert readlast(f, 1, b'\n') == b""

Edge cases (2.7+)

The simplest approach, part from reading the whole file, is to step over data from the end of the file and checks each read byte, or block of bytes, against a delimiter value/character.

It is not as fast as the galloping search function above but it's much easier to write function than can handle edge cases like UTF-16/32 encoded files and files where other multi byte line separators is used.

This example can, apart from that, also handle files opened in text mode (but you should still consider re-opening them in byte mode, as its relative seek calls is more efficient).

def _tail__bytes(f, n, sep, size, step):
    # Point cursor to the end of the file.
    f.seek(0, SEEK_END)
    # Halt when 'sep' occurs enough times.
    while n > 0:
        # Seek past the byte just read, or last byte if none has been read.
        f.seek(-size-step, SEEK_CUR)
        # Read one byte/char/block, then step again, until 'sep' occurs.
        while f.read(size) != sep:
            f.seek(-size-step, SEEK_CUR)
        n -= 1

def _tail__text(f, n, sep, size, step):
    # Text mode, same principle but without the use of relative offsets.
    o = f.seek(0, SEEK_END)
    o = f.seek(o-size-step)
    while n > 0:
        o = f.seek(o-step)
        while f.read(step) != sep:
            o = f.seek(o-step)
        n -= 1

def tail(f, n, sep, fixed = False):
    """tail(f: io.BaseIO, n: int, sep: bytes, fixed: bool = False) -> bytes|str

    Return the last `n` segments of file `f`, separated by `sep`.

    Set `fixed` to True when parsing UTF-32 or UTF-16 encoded data (don't forget
    to pass the correct delimiter) in files opened in byte mode.
    """
    size = len(sep)
    step = len(sep) if (fixed is True) else (fixed or 1)
    if not size:
        raise ValueError("Zero-length separator.")
    try:
        if 'b' in f.mode:
            # Process file opened in byte mode.
            _tail__bytes(f, n, sep, size, step)
        else:
            # Process file opened in text mode.
            _tail__text(f, n, sep, size, step)
    except (OSError, ValueError): 
        # Beginning of file reached.
        f.seek(0, SEEK_SET)
    return f.read()

Usage:

f.write("X\nY\nZ\n").encode('utf32'); f.seek(0)
assert tail(f, 1, "\n".encode('utf32')[4:], fixed = True) == b"Z\n"
f.write("X\nY\nZ\n").encode('utf16'); f.seek(0)
assert tail(f, 1, "\n".encode('utf16')[2:], fixed = True) == b"Z\n"
f.write(b'X<br>Y</br>'); f.seek(0)
assert readlast(f, 1, b'<br>') == b"Y</br>"
f.write("X\nY\n"); f.seek(0)
assert readlast(f, 1, "\n") == "Y\n"

Examples were tested against files of varying lengths, empty files, files of various sizes that consist of only newlines and so on before being posted. Ignores trailing newline character.

Trasp
  • 1,132
  • 7
  • 11
0

I had to read a specific value from the last line of a file, and stumbled upon this thread. Rather than reinventing the wheel in Python, I ended up with a tiny shell script, saved as /usr/local/bin/get_last_netp:

#! /bin/bash
tail -n1 /home/leif/projects/transfer/export.log | awk {'print $14'}

And in the Python program:

from subprocess import check_output

last_netp = int(check_output("/usr/local/bin/get_last_netp"))
Leifbk
  • 9
  • 1
0

Not the first example using a deque, but a simpler one. This one is general: it works on any iterable object, not just a file.

#!/usr/bin/env python
import sys
import collections
def tail(iterable, N):
    deq = collections.deque()
    for thing in iterable:
        if len(deq) >= N:
            deq.popleft()
        deq.append(thing)
    for thing in deq:
        yield thing
if __name__ == '__main__':
    for line in tail(sys.stdin,10):
        sys.stdout.write(line)
Hal Canary
  • 2,154
  • 17
  • 17
0
This is my version of tailf

import sys, time, os

filename = 'path to file'

try:
    with open(filename) as f:
        size = os.path.getsize(filename)
        if size < 1024:
            s = size
        else:
            s = 999
        f.seek(-s, 2)
        l = f.read()
        print l
        while True:
            line = f.readline()
            if not line:
                time.sleep(1)
                continue
            print line
except IOError:
    pass
Raj
  • 37
  • 1
  • 1
  • 3
0
import time

attemps = 600
wait_sec = 5
fname = "YOUR_PATH"

with open(fname, "r") as f:
    where = f.tell()
    for i in range(attemps):
        line = f.readline()
        if not line:
            time.sleep(wait_sec)
            f.seek(where)
        else:
            print line, # already has newline
moylop260
  • 1,288
  • 2
  • 13
  • 20
0
import itertools
fname = 'log.txt'
offset = 5
n = 10
with open(fname) as f:
    n_last_lines = list(reversed([x for x in itertools.islice(f, None)][-(offset+1):-(offset+n+1):-1]))
Yannis
  • 912
  • 2
  • 14
  • 34
0
abc = "2018-06-16 04:45:18.68"
filename = "abc.txt"
with open(filename) as myFile:
    for num, line in enumerate(myFile, 1):
        if abc in line:
            lastline = num
print "last occurance of work at file is in "+str(lastline) 
0

Another Solution

if your txt file looks like this: mouse snake cat lizard wolf dog

you could reverse this file by simply using array indexing in python '''

contents=[]
def tail(contents,n):
    with open('file.txt') as file:
        for i in file.readlines():
            contents.append(i)

    for i in contents[:n:-1]:
        print(i)

tail(contents,-5)

result: dog wolf lizard cat

Blaine McMahon
  • 167
  • 1
  • 4
0

Well! I had a similar problem, though I only required LAST LINE ONLY, so I came up with my own solution

def get_last_line(filepath):
    try:
        with open(filepath,'rb') as f:
            f.seek(-1,os.SEEK_END)
            text = [f.read(1)]
            while text[-1] != '\n'.encode('utf-8') or len(text)==1:
                f.seek(-2, os.SEEK_CUR)
                text.append(f.read(1))
    except Exception as e:
        pass
    return ''.join([t.decode('utf-8') for t in text[::-1]]).strip()

This function return last string in a file
I have a log file of 1.27gb and it took very very less time to find the last line (not even half a second)

rish_hyun
  • 451
  • 1
  • 7
  • 13
0

Two solutions based on counting '\n' from file end, tail1 uses memory map, tail2 does not. Speed is similar, both are fast but mmap version is faster. Both functions return last n lines (from n+1 '\n' to EOF) as string.

import mmap
def tail1(fn, n=5, encoding='utf8'):
    with open(fn) as f:
        mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
        nn = len(mm)
        for i in range(n+1):
            nn = mm.rfind(b'\n',0,nn)
            if nn < 0: break
        return mm[nn:].decode(encoding=encoding).strip()


def tail2(fn, n=5, encoding='utf8'):
    with open(fn,'rb') as f:
        for i in range(f.seek(0, 2), 0, -1):
            _ = f.seek(i)
            if f.read(1) == b'\n': n -= 1
            if n < 0: break
        return f.read().decode(encoding=encoding).strip()
Jacek Błocki
  • 452
  • 3
  • 9
-1

On second thought, this is probably just as fast as anything here.

def tail( f, window=20 ):
    lines= ['']*window
    count= 0
    for l in f:
        lines[count%window]= l
        count += 1
    print lines[count%window:], lines[:count%window]

It's a lot simpler. And it does seem to rip along at a good pace.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • Because nearly everything here doesn't work with log files with more than 30 MB or so without loading the same amount of memory into the RAM ;) Your first version is a lot better, but for the test files here it performs slightly worse than mine and it doesn't work with different newline characters. – Armin Ronacher Sep 25 '08 at 22:06
  • 3
    I was wrong. Version 1 took 0.00248908996582 for 10 tails through the dictionary. Version 2 took 1.2963051796 for 10 tails through the dictionary. I'd almost vote myself down. – S.Lott Sep 25 '08 at 22:06
  • "doesn't work with different newline characters." Replace datacount('\n') with len(data.splitlines()) if it matters. – S.Lott Sep 25 '08 at 22:15
-1

I found a probably the easiest way to find the first or last N lines of a file

Last N lines of a file(For Ex:N=10)

file=open("xyz.txt",'r")
liner=file.readlines()
for ran in range((len(liner)-N),len(liner)):
    print liner[ran]

First N lines of a file(For Ex:N=10)

file=open("xyz.txt",'r")
liner=file.readlines()
for ran in range(0,N+1):
    print liner[ran]
-2

it's so simple:

def tail(fname,nl):
with open(fname) as f:
    data=f.readlines() #readlines return a list
    print(''.join(data[-nl:]))
Med sadek
  • 1
  • 1
-5

Although this isn't really on the efficient side with big files, this code is pretty straight-forward:

  1. It reads the file object, f.
  2. It splits the string returned using newlines, \n.
  3. It gets the array lists last indexes, using the negative sign to stand for the last indexes, and the : to get a subarray.

    def tail(f,n):
        return "\n".join(f.read().split("\n")[-n:])
    
WorkingRobot
  • 423
  • 8
  • 19
  • The person who downvoted my answer, could you please explain why? – WorkingRobot Jun 10 '16 at 22:28
  • 4
    the very first moment you use `f.read()` and not seek on the file handler, you are putting ALL of your file on memory. Buffering whole file (and not seeking) is WRONG, so your anwers doesn't really add anything new to the problem, just another way to fullfit your memory. Now, try to use your code with a 10gb file then look what happens. Using itertools is another way to try out, but both **seek** and **tail** will do the trick. You don't need to put all your lines into memory to process them, you can put them in chunks. I hope you understand. – m3nda Dec 27 '16 at 13:28
  • This function wasn't meant to be a finalized function. It was merely a worst case scenario. – WorkingRobot Dec 27 '16 at 14:23