4

This is probably a bit of a silly excercise for me, but it raises a bunch of interesting questions. I have a directory of logfiles from my chat client, and I want to be notified using notify-osd every time one of them changes.

The script that I wrote basically uses os.popen to run the linux tail command on every one of the files to get the last line, and then check each line against a dictionary of what the lines were the last time it ran. If the line changed, it used pynotify to send me a notification.

This script actually worked perfectly, except for the fact that it used a huge amount of cpu (probably because it was running tail about 16 times every time the loop ran, on files that were mounted over sshfs.)

It seems like something like this would be a great solution, but I don't see how to implement that for more than one file.

Here is the script that I wrote. Pardon my lack of comments and poor style.

Edit: To clarify, this is all linux on a desktop.

Community
  • 1
  • 1
keevie
  • 65
  • 1
  • 6

3 Answers3

9

Not even looking at your source code, there are two ways you could easily do this more efficiently and handle multiple files.

  1. Don't bother running tail unless you have to. Simply os.stat all of the files and record the last modified time. If the last modified time is different, then raise a notification.

  2. Use pyinotify to call out to Linux's inotify facility; this will have the kernel do option 1 for you and call back to you when any files in your directory change. Then translate the callback into your osd notification.

Now, there might be some trickiness depending on how many notifications you want when there are multiple messages and whether you care about missing a notification for a message.

An approach that preserves the use of tail would be to instead use tail -f. Open all of the files with tail -f and then use the select module to have the OS tell you when there's additional input on one of the file descriptors open for tail -f. Your main loop would call select and then iterate over each of the readable descriptors to generate notifications. (You could probably do this without using tail and just calling readline() when it's readable.)

Other areas of improvement in your script:

  • Use os.listdir and native Python filtering (say, using list comprehensions) instead of a popen with a bunch of grep filters.
  • Update the list of buffers to scan periodically instead of only doing it at program boot.
  • Use subprocess.popen instead of os.popen.
Emil Sit
  • 22,894
  • 7
  • 53
  • 75
  • Thanks a lot. I think I implemented the first option correctly--it works and is using a lot less cpu. (It was the only one I really understood.) [My Improved Code](http://pastie.org/1814191) – keevie Apr 20 '11 at 04:17
  • You're basically trying to find the most efficient mechanism to identify when files have changed and do something about it. (See my Quora answer on [How tail -f is implemented](http://www.quora.com/How-is-tail-f-implemented) for example.) Calling `os.stat` is cheaper than forking a process. Using inotify is even better. – Emil Sit Apr 20 '11 at 15:40
5

If you're already using the pyinotify module, it's easy to do this in pure Python (i.e. no need to spawn a separate process to tail each file).

Here is an example that is event-driven by inotify, and should use very little cpu. When IN_MODIFY occurs for a given path we read all available data from the file handle and output any complete lines found, buffering the incomplete line until more data is available:

import os
import select
import sys
import pynotify
import pyinotify

class Watcher(pyinotify.ProcessEvent):

    def __init__(self, paths):
        self._manager = pyinotify.WatchManager()
        self._notify = pyinotify.Notifier(self._manager, self)
        self._paths = {}
        for path in paths:
            self._manager.add_watch(path, pyinotify.IN_MODIFY)
            fh = open(path, 'rb')
            fh.seek(0, os.SEEK_END)
            self._paths[os.path.realpath(path)] = [fh, '']

    def run(self):
        while True:
            self._notify.process_events()
            if self._notify.check_events():
                self._notify.read_events()

    def process_default(self, evt):
        path = evt.pathname
        fh, buf = self._paths[path]
        data = fh.read()
        lines = data.split('\n')
        # output previous incomplete line.
        if buf:
            lines[0] = buf + lines[0]
        # only output the last line if it was complete.
        if lines[-1]:
            buf = lines[-1]
        lines.pop()

        # display a notification
        notice = pynotify.Notification('%s changed' % path, '\n'.join(lines))
        notice.show()

        # and output to stdout
        for line in lines:
            sys.stdout.write(path + ': ' + line + '\n')
        sys.stdout.flush()
        self._paths[path][1] = buf

pynotify.init('watcher')
paths = sys.argv[1:]
Watcher(paths).run()

Usage:

% python watcher.py [path1 path2 ... pathN]
samplebias
  • 37,113
  • 6
  • 107
  • 103
  • He's using pynotify, not pyinotify. But, nice! You could pop the last incomplete line off into `buf` and then drop the use of `limit`, I think. – Emil Sit Apr 20 '11 at 04:05
  • This looks like a better solution, but it's quite a bit over my head I'm afraid. Thanks a lot anyway though! – keevie Apr 20 '11 at 04:18
  • Updated it to display a notification via pynotify. – samplebias Apr 20 '11 at 04:21
  • While I like this solution, it lacks handling logrotate properly. Consider adding something along those lines. fail2ban has a great example for that here: https://github.com/fail2ban/fail2ban/blob/master/server/filterpyinotify.py – Vajk Hermecz Mar 11 '14 at 11:55
0

Simple pure python solution (not the best, but doesn't fork, spits out 4 empty lines after idle period and marks everytime the source of the chunk, if changed):

#!/usr/bin/env python

from __future__ import with_statement

'''
Implement multi-file tail
'''

import os
import sys
import time


def print_file_from(filename, pos):
    with open(filename, 'rb') as fh:
        fh.seek(pos)
        while True:
            chunk = fh.read(8192)
            if not chunk:
                break
            sys.stdout.write(chunk)


def _fstat(filename):
    st_results = os.stat(filename)
    return (st_results[6], st_results[8])


def _print_if_needed(filename, last_stats, no_fn, last_fn):
    changed = False
    #Find the size of the file and move to  the end
    tup = _fstat(filename)
    # print tup
    if last_stats[filename] != tup:
        changed = True
        if not no_fn and last_fn != filename:
            print '\n<%s>' % filename
        print_file_from(filename, last_stats[filename][0])
        last_stats[filename] = tup
    return changed


def multi_tail(filenames, stdout=sys.stdout, interval=1, idle=10, no_fn=False):
    S = lambda (st_size, st_mtime): (max(0, st_size - 124), st_mtime)
    last_stats = dict((fn, S(_fstat(fn))) for fn in filenames)
    last_fn = None
    last_print = 0
    while 1:
        # print last_stats
        changed = False
        for filename in filenames:
            if _print_if_needed(filename, last_stats, no_fn, last_fn):
                changed = True
                last_fn = filename
        if changed:
            if idle > 0:
                last_print = time.time()
        else:
            if idle > 0 and last_print is not None:
                if time.time() - last_print >= idle:
                    last_print = None
                    print '\n' * 4
            time.sleep(interval)

if '__main__' == __name__:
    from optparse import OptionParser
    op = OptionParser()
    op.add_option('-F', '--no-fn', help="don't print filename when changes",
        default=False, action='store_true')
    op.add_option('-i', '--idle', help='idle time, in seconds (0 turns off)',
        type='int', default=10)
    op.add_option('--interval', help='check interval, in seconds', type='int',
        default=1)
    opts, args = op.parse_args()
    try:
        multi_tail(args, interval=opts.interval, idle=opts.idle,
            no_fn=opts.no_fn)
    except KeyboardInterrupt:
        pass
gthomas
  • 77
  • 2