1

I am trying to set up a mail log parser that will pull out specific lines into another file, which will then get rsync'd to a remote server. The problem I am having is that when rsync reads the file being written, it seems to cause the parser to stop functioning. I believe this is because the parser is emulating a tail -f as maillog is being written consistently.

So: How do I allow rsync to touch the file I'm writing with this code (result_file), while still allowing it to follow the end of the maillog looking for new files:

#! /usr/bin/python

import time, re, sys

result_file = open('/var/log/mrp_mail_parsed.log', 'a+')


def tail(logfile):
    logfile.seek(0,2)
    while True:
        line = logfile.readline()
        if not line:
            time.sleep(0.1)
            continue
        yield line

if __name__ == '__main__':
    logfile = open('/var/log/maillog', 'r')
    logline = tail(logfile)
    for line in logline:
        match = re.search(r'.+postfix-mrp.+', line)
        if match:
            result_file.write(line,)
            result_file.flush()
  • `tail` isn't a bash function, it's a separate tool, and that's not at all how `tail -f` actually works. So, if your assumptions of what should happen were based on those two facts, there's a very good chance your assumptions are wrong. – abarnert May 23 '13 at 23:15
  • tail -f follows the end of the file. What I'm doing above is following the end of the file. – Kris Griebe May 23 '13 at 23:21
  • Related: http://stackoverflow.com/questions/12523044/how-can-i-tail-a-log-file-in-python – Bjoern Rennhak May 23 '13 at 23:21
  • @KrisGriebe: Yes, but that's not _how_ tail follows the end of a file. It's like saying that `tr a a – abarnert May 23 '13 at 23:24
  • Meanwhile, if you really think `rsync` is relevant, you will need to show us your `rsync` command. Could `rsync` be actually writing, or even replacing, the file? Yes, it definitely can. The only reason we have to believe that it isn't doing that is that you described it as "rsync reads the file". – abarnert May 23 '13 at 23:27
  • (I seemingly fail at inputting multi-line code in comments)The bits of the rsync command that matter are: `run_rsync() { RSYNC_BIN="/usr/bin/rsync" RSYNC_OPTIONS="-ai --inplace" SOURCE="/var/log/mrp_mail_parsed.log" DEST="user@server:/opt/mounts/feeds/maillogs" ${RSYNC_BIN} ${RSYNC_OPTIONS} ${SOURCE} ${DEST} rm -rf $LOCKFILE rm -rf $COUNTFILE }` – Kris Griebe May 23 '13 at 23:28
  • Does your python script deal with `$LOCKFILE` or `$COUNTFILE` in anyway? (Also, you don't need `rm -r` unless you're removing a directory). – alexis May 24 '13 at 13:33
  • You can't input multi-line code in comments; SO doesn't support that. But you can edit your question. – abarnert May 24 '13 at 17:35
  • Meanwhile, why are you using `rsync -a` and `rm -r` for single files? The fact that you're doing so implies that you haven't chosen the options by figuring out what you want and then reading the manpage to see how to do it, but rather copied and pasted the commands from somewhere without understanding exactly what they do. – abarnert May 24 '13 at 17:35

2 Answers2

1

I don't know who's writing the file, or how, so I can't be sure, but I'd give better than even odds that your problem is this:

If the file isn't being appended to in-place, but is instead being rewritten, your code will stop tracking the file. To test this:

import sys
import time

def tail(logfile):
    logfile.seek(0,2)
    while True:
        line = logfile.readline()
        if not line:
            time.sleep(0.1)
            continue
        yield line

with open(sys.argv[1]) as f:
    for line in tail(f):
        print(line.rstrip())

Now:

$ touch foo
$ python tailf.py foo &
$ echo "hey" >> foo
foo
$ echo "hey" > foo

To see what's happening better, try checking the inode and size via stat. As soon as the path refers to a different file than the one your script has open, your script is now watching a file that nobody else will ever touch again.

It's also possible that someone is truncating and rewriting the file in-place. This won't change the inode, but it will still mean that you won't read anything, because you're trying to read from a position past the end of the file.

I have no idea whether the file being rsync'd is causing this, or whether that's just a coincidence. Without knowing what rsync command you're running, or seeing whether the file is being replaced or the file is being truncated and rewritten when that command runs, all we can do is guess.

abarnert
  • 354,177
  • 51
  • 601
  • 671
0

I don't believe rsync is causing your problems: A separate process reading the file shouldn't affect the writer. You can easily test this by pausing rsync.

I'm guessing the problem is with python's handling of file reads when you hit end of file. A crude way that's guaranteed to work is to read to remember the offest at the last EOF (using tell()). For each new read, reopen the file and seek to the remembered offset.

alexis
  • 48,685
  • 16
  • 101
  • 161
  • The reading of the maillog functions without any issues until I attempt to rsync the result_file. – Kris Griebe May 23 '13 at 23:23
  • Then I must admit I was wrong. (Still, do try the solution I suggest). I see you added your rsync command in a comment; you can get around the multi-line restriction by including it in the question-- that's where it belongs anyway. – alexis May 24 '13 at 13:29