30

What is the pythonic way of watching the tail end of a growing file for the occurrence of certain keywords?

In shell I might say:

tail -f "$file" | grep "$string" | while read hit; do
    #stuff
done
dbr
  • 165,801
  • 69
  • 278
  • 343
pra
  • 8,479
  • 2
  • 18
  • 15
  • 1
    Somewhat similar: http://stackoverflow.com/questions/136168/tail-a-file-with-python – Mark Nov 09 '09 at 21:08
  • 3
    Those two questions seem identical, but this one is about constantly monitoring a file for new lines, whereas the other question is about reading the last x lines – dbr Nov 09 '09 at 21:40

10 Answers10

31

Well, the simplest way would be to constantly read from the file, check what's new and test for hits.

import time

def watch(fn, words):
    fp = open(fn, 'r')
    while True:
        new = fp.readline()
        # Once all lines are read this just returns ''
        # until the file changes and a new line appears

        if new:
            for word in words:
                if word in new:
                    yield (word, new)
        else:
            time.sleep(0.5)

fn = 'test.py'
words = ['word']
for hit_word, hit_sentence in watch(fn, words):
    print "Found %r in line: %r" % (hit_word, hit_sentence)

This solution with readline works if you know your data will appear in lines.

If the data is some sort of stream you need a buffer, larger than the largest word you're looking for, and fill it first. It gets a bit more complicated that way...

dbr
  • 165,801
  • 69
  • 278
  • 343
Jochen Ritzel
  • 104,512
  • 31
  • 200
  • 194
7
def tail(f):
    f.seek(0, 2)

    while True:
        line = f.readline()

        if not line:
            time.sleep(0.1)
            continue

        yield line

def process_matches(matchtext):
    while True:
        line = (yield)  
        if matchtext in line:
            do_something_useful() # email alert, etc.


list_of_matches = ['ERROR', 'CRITICAL']
matches = [process_matches(string_match) for string_match in list_of_matches]    

for m in matches: # prime matches
    m.next()

while True:
    auditlog = tail( open(log_file_to_monitor) )
    for line in auditlog:
        for m in matches:
            m.send(line)

I use this to monitor log files. In the full implementation, I keep list_of_matches in a configuration file so it can be used for multiple purposes. On my list of enhancements is support for regex instead of a simple 'in' match.

user166278
  • 121
  • 2
4

EDIT: as the comment below notes, O_NONBLOCK doesn't work for files on disk. This will still help if anyone else comes along looking to tail data coming from a socket or named pipe or another process, but it doesn't answer the actual question that was asked. Original answer remains below for posterity. (Calling out to tail and grep will work, but is a non-answer of sorts anyway.)

Either open the file with O_NONBLOCK and use select to poll for read availability and then read to read the new data and the string methods to filter lines on the end of a file...or just use the subprocess module and let tail and grep do the work for you just as you would in the shell.

Walter Mundt
  • 24,753
  • 5
  • 53
  • 61
  • 1
    Non-blocking IO is not supported for files on most (if not all) OS. That includes Linux and FreeBSD. If it were, combination of non-blocking IO and poll/select/whatever would be the same as blocking reading, which doesn't do what the OP needs. – WGH May 06 '14 at 22:09
  • @WGH: Good point! I think I learned that after having written this, but it's important to note for anyone who comes along to this question later on, so I've prefaced my answer with a disclaimer to that effect. – Walter Mundt May 29 '14 at 19:57
3

You can use select to poll for new contents in a file.

def tail(filename, bufsize = 1024):
    fds = [ os.open(filename, os.O_RDONLY) ]
    while True:
        reads, _, _ = select.select(fds, [], [])
        if 0 < len(reads):
            yield os.read(reads[0], bufsize)
Corey Porter
  • 1,309
  • 8
  • 9
  • 6
    You cannot use select to poll files for new data - only sockets. See: http://docs.python.org/library/select.html, first paragraph last sentence: It cannot be used on regular files to determine whether a file has grown since it was last read. – synthesizerpatel Dec 28 '11 at 16:28
2

Looks like there's a package for that: https://github.com/kasun/python-tail

tobych
  • 2,941
  • 29
  • 18
2

you can use pytailf : Simple python tail -f wrapper

from tailf import tailf    

for line in tailf("myfile.log"):
    print line
kommradHomer
  • 4,127
  • 5
  • 51
  • 68
  • 6
    Dont waste your time with tailf, it's calling "/usr/bin/tail" internally, which is surely not what you want. – Basil Musa Aug 20 '17 at 10:40
  • 4
    "*`# Mwa-ha-ha, this is easiest way. Hardly portable to windowz, but who cares? TAILF_COMMAND = ['/usr/bin/tail', '-F', '-n']`*" – Bergi Feb 15 '20 at 15:54
1

If you can't constraint the problem to work for a line-based read, you need to resort to blocks.

This should work:

import sys

needle = "needle"

blocks = []

inf = sys.stdin

if len(sys.argv) == 2:
    inf = open(sys.argv[1])

while True:
    block = inf.read()
    blocks.append(block)
    if len(blocks) >= 2:
        data = "".join((blocks[-2], blocks[-1]))
    else:
        data = blocks[-1]

    # attention, this needs to be changed if you are interested
    # in *all* matches separately, not if there was any match ata all
    if needle in data:
        print "found"
        blocks = []
    blocks[:-2] = []

    if block == "":
        break

The challenge lies in ensuring that you match needle even if it's separated by two block-boundaries.

deets
  • 6,285
  • 29
  • 28
  • Good catch; I swim in log files all day, sometimes I forget that not all data comes in lines. – pra Nov 11 '09 at 15:14
0

To my knowledge there's no equivalent to "tail" in the Python function list. Solution would be to use tell() (get file size) and read() to work out the ending lines.

This blog post (not by me) has the function written out, looks appropriate to me! http://www.manugarg.com/2007/04/real-tailing-in-python.html

-1

If you just need a dead simple Python 3 solution for processing the lines of a text file as they're written, and you don't need Windows support, this worked well for me:

import subprocess
def tailf(filename):
    #returns lines from a file, starting from the beginning
    command = "tail -n +1 -F " + filename
    p = subprocess.Popen(command.split(), stdout=subprocess.PIPE, universal_newlines=True)
    for line in p.stdout:
        yield line
for line in tailf("logfile"):
    #do stuff

It blocks waiting for new lines to be written, so this isn't suitable for asynchronous use without some modifications.

James
  • 1,239
  • 1
  • 11
  • 18
-2

You can use collections.deque to implement tail.

From http://docs.python.org/library/collections.html#deque-recipes ...

def tail(filename, n=10):
    'Return the last n lines of a file'
    return deque(open(filename), n)

Of course, this reads the entire file contents, but it's a neat and terse way of implementing tail.

FogleBird
  • 74,300
  • 25
  • 125
  • 131