27

I need to emulate "tail -f" in python, but I don't want to use time.sleep in the reading loop. I want something more elegant like some kind of blocking read, or select.select with timeout, but python 2.6 "select" documentation specifically says: "it cannot be used on regular files to determine whether a file has grown since it was last read." Any other way? In a few days if no solution is given I will read tail's C source code to try to figure it out. I hope they don't use sleep, hehe Thanks.

MarioR

10 Answers10

34

(update) Either use FS monitors tools

Or a single sleep usage (which I would you consider as much more elegant).

import time
def follow(thefile):
    thefile.seek(0,2)      # Go to the end of the file
    while True:
         line = thefile.readline()
         if not line:
             time.sleep(0.1)    # Sleep briefly
             continue
         yield line

logfile = open("access-log")
loglines = follow(logfile)
for line in loglines:
    print line
Tzury Bar Yochay
  • 8,798
  • 5
  • 49
  • 73
  • Heh, I was about to post almost exactly the same code (although not as a generator, which is much more elegant), +1 – dbr Sep 25 '09 at 14:03
  • Why not `if line: yield line time.sleep(0.1)`? It may cause extra sleeps, but it's not really a big deal (in my opinion). – Chris Lutz Sep 26 '09 at 23:10
  • 1
    if it causes extra sleeps, why yes then? – Tzury Bar Yochay Sep 27 '09 at 03:33
  • 6
    @ChrisLutz the faster your file grows, the bigger deal the extra sleeps become. At 10 lines/sec you're falling hopelessly behind, and even a modestly busy web server can generate hundreds of lines per second. – sh-beta Sep 24 '11 at 23:28
11

To minimize the sleep issues I modified Tzury Bar Yochay's solution and now it polls quickly if there is activity and after a few seconds of no activity it only polls every second.

import time

def follow(thefile):
    thefile.seek(0,2)      # Go to the end of the file
    sleep = 0.00001
    while True:
        line = thefile.readline()
        if not line:
            time.sleep(sleep)    # Sleep briefly
            if sleep < 1.0:
                sleep += 0.00001
            continue
        sleep = 0.00001
        yield line

logfile = open("/var/log/system.log")
loglines = follow(logfile)
for line in loglines:
    print line,
James Reynolds
  • 111
  • 1
  • 2
10

When reading from a file, your only choice is sleep (see the source code). If you read from a pipe, you can simply read since the read will block until there is data ready.

The reason for this is that the OS doesn't support the notion "wait for someone to write to a file". Only recently, some filesystems added an API where you can listen for changes made to a file but tail is too old to use this API and it's also not available everywhere.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
3

The simplest C implementation of tail -f for Linux is this:

#include <unistd.h>
#include <sys/inotify.h>

int main() {
    int inotify_fd = inotify_init();
    inotify_add_watch(inotify_fd, "/tmp/f", IN_MODIFY);
    struct inotify_event event;
    while (1) {
        read(inotify_fd, &event, sizeof(event));
        [file has changed; open, stat, read new data]
    }
}

This is just a minimal example that's obviously lacking error checking and won't notice when the file is deleted/moved, but it should give a good idea about what the Python implementation should look like.

Here's a proper Python implementation that uses the built-in ctypes to talk to inotify in the way outlined above.

""" simple python implementation of tail -f, utilizing inotify. """

import ctypes
from errno import errorcode
import os
from struct import Struct

# constants from <sys/inotify.h>
IN_MODIFY = 2
IN_DELETE_SELF = 1024
IN_MOVE_SELF = 2048

def follow(filename, blocksize=8192):
    """
    Monitors the file, and yields bytes objects.

    Terminates when the file is deleted or moved.
    """
    with INotify() as inotify:
        # return when we encounter one of these events.
        stop_mask = IN_DELETE_SELF | IN_MOVE_SELF

        inotify.add_watch(filename, IN_MODIFY | stop_mask)

        # we have returned this many bytes from the file.
        filepos = 0
        while True:
            with open(filename, "rb") as fileobj:
                fileobj.seek(filepos)
                while True:
                    data = fileobj.read(blocksize)
                    if not data:
                        break
                    filepos += len(data)
                    yield data

            # wait for next inotify event
            _, mask, _, _ = inotify.next_event()
            if mask & stop_mask:
                break

LIBC = ctypes.CDLL("libc.so.6")


class INotify:
    """ Ultra-lightweight inotify class. """
    def __init__(self):
        self.fd = LIBC.inotify_init()
        if self.fd < 0:
            raise OSError("could not init inotify: " + errorcode[-self.fd])
        self.event_struct = Struct("iIII")

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc, exc_tb):
        self.close()

    def close(self):
        """ Frees the associated resources. """
        os.close(self.fd)

    def next_event(self):
        """
        Waits for the next event, and returns a tuple of
        watch id, mask, cookie, name (bytes).
        """
        raw = os.read(self.fd, self.event_struct.size)
        watch_id, mask, cookie, name_size = self.event_struct.unpack(raw)
        if name_size:
            name = os.read(self.fd, name_size)
        else:
            name = b""

        return watch_id, mask, cookie, name

    def add_watch(self, filename, mask):
        """
        Adds a watch for filename, with the given mask.
        Returns the watch id.
        """
        if not isinstance(filename, bytes):
            raise TypeError("filename must be bytes")
        watch_id = LIBC.inotify_add_watch(self.fd, filename, mask)
        if watch_id < 0:
            raise OSError("could not add watch: " + errorcode[-watch_id])
        return watch_id


def main():
    """ CLI """
    from argparse import ArgumentParser
    cli = ArgumentParser()
    cli.add_argument("filename")
    args = cli.parse_args()
    import sys
    for data in follow(args.filename.encode()):
        sys.stdout.buffer.write(data)
        sys.stdout.buffer.flush()

if __name__ == '__main__':
    try:
        main()
    except KeyboardInterrupt:
        print("")

Note that there are various inotify adapters for Python, such as inotify, pyinotify and python-inotify. Those would basically do the work of the INotify class.

mic_e
  • 5,594
  • 4
  • 34
  • 49
0

IMO you should use sleep, it works on all platform and code will be simple

Otherwise you can use platform specific APIs which can tell you when file change e.g. on window use FindFirstChangeNotification on folder and watch for FILE_NOTIFY_CHANGE_LAST_WRITE events

On linux i think you can use i-notify

On Mac OSX use FSEvents

Anurag Uniyal
  • 85,954
  • 40
  • 175
  • 219
0

You can see here how to do a "tail -f" like using inotify:

This is an exemple[sic] to show how to use the inotify module, it could be very usefull unchanged though.

A Watcher instance let you define callbacks for any event that occur on any file or directory and subdirectories.

The inotify module is from Recipe 576375

chrisjlee
  • 21,691
  • 27
  • 82
  • 112
dugres
  • 12,613
  • 8
  • 46
  • 51
  • 2
    Your answer should include the code. There's no guarantee that link will still work in the future. – sh-beta Sep 24 '11 at 23:31
0

Most implementations I've seen use readlines() / sleep(). A solution based on inotify or similar might be faster but consider this:

  • once libinotify tells you a file has changed you would end up using readlines() anyway

  • calling readlines() against a file which hasn't changed, which is what you would end up doing without libinotify, is already a pretty fast operation:

    giampaolo@ubuntu:~$ python -m timeit -s "f = open('foo.py', 'r'); f.read()" -c "f.readlines()" 1000000 loops, best of 3: 0.41 usec per loop

Having said this, considering that any solution similar to libinotify has portability issues, I might reconsider using readlines() / sleep(). See: http://code.activestate.com/recipes/577968-log-watcher-tail-f-log/

Community
  • 1
  • 1
Giampaolo Rodolà
  • 12,488
  • 6
  • 68
  • 60
-1

There's an awesome library called sh can tail a file with thread block.

for line in sh.tail('-f', '/you_file_path', _iter=True):
    print(line)
Kxrr
  • 506
  • 6
  • 14
-2

If you can use GLib on all platforms, you should use glib.io_add_watch; then you can use a normal GLib mainloop and process events as they happen, without any polling behavior.

http://library.gnome.org/devel/pygobject/stable/glib-functions.html#function-glib--io-add-watch

u0b34a0f6ae
  • 48,117
  • 14
  • 92
  • 101
  • This will work for reading to the end of file once, but not for reading extra data as it's written. – daf Apr 12 '12 at 14:55
-3

Why don't you just use subprocess.call on tail itself?

subproces.call(['tail', '-f', filename])

Edit: Fixed to eliminate extra shell process.

Edit2: Fixed to eliminate deprecated os.popen and thus the need to interpolate parameters, escape espaces and other stuff, and then run a shell process.

nosklo
  • 217,122
  • 57
  • 293
  • 297
alex tingle
  • 6,920
  • 3
  • 25
  • 29
  • 2
    -1: popen is the wrong way - it invokes a new shell process needlessy just to run the tail program. – nosklo Sep 25 '09 at 11:26
  • nosklo: You are right. I've fixed it to use exec. This way you get the benefit of the shell for command line parsing, but don't have the overhead of the extra process. – alex tingle Sep 25 '09 at 12:29
  • 4
    The shell command line parsing is not a benefit. You're interpolating the tail command, the parameter and file name into a single string, and then running a shell process to separate them again. And by doing that you also need to quote shell special characters like spaces yourself (you're doing it using single quotes). Isn't that extra work for nothing? What if the filename itself has quotes on it? You'll have to backslash-escape? Isn't it better to just do `subprocess.call(['tail', '-f', filename])` ?? No shell, no joining of parameters so the shell can split them, no quoting of characters. – nosklo Sep 25 '09 at 13:15
  • You are absolutely right, subprocess.call is certainly better. – alex tingle Sep 25 '09 at 14:21
  • `subprocess.Popen` does not necessarily create a shell -- if passed an array rather than a string and not given `shell=True`, it performs a direct exec of the invoked process just as `subprocess.call` does here. – Charles Duffy Sep 26 '09 at 23:28
  • The question states that there's a "read loop", but this code doesn't capture the tail process's output and it'll just get spewed to stdout where the python code can't do anything with it. – markshep Sep 14 '14 at 14:03
  • Besides the problem with missing `tail` command and you're using `call` instead of `Popen` the idea is clever. With `Popen` you can redirect the output to a pipe which in turn can be read from which should block and wait for more input. – skyking Jun 25 '15 at 08:58