8

I have a process that is logging messages to a file.

I want to implement another process (in Python) that parses these logs (as they are written to the file), filters the lines that I'm interested in and then performs certain actions based on the state of the first process.

I was wondering before I go ahead and write something on my own if there is a library in Python that does something like this.

Also, ideas regarding how implement something like this Python would be appreciated.

vvvvv
  • 25,404
  • 19
  • 49
  • 81
Soumya Simanta
  • 11,523
  • 24
  • 106
  • 161
  • Library that does what? Filters lines and performs actions? That's a very general type of task. ETA: Oh, I understand, you mean keep track of new lines that are written. – David Robinson Aug 10 '12 at 20:41
  • 3
    it might be possible to `p = subprocess.Popen(['tail -f', file_name], stdout=subprocess.PIPE)` on your log file and then use `p.stdout.readline()` repeatedly. Just an idea – Ryan Haining Aug 10 '12 at 20:43
  • 1
    Keep in mind that using `tail -F` won't work on all systems. That being said, it would make for a pretty easy implementation. – D.A Aug 10 '12 at 20:46
  • If you don't need the logfile for any other purpose, you might replace it with a named pipe (see `mkpipe` man page). Then you'd start the main process and your python tool, they'd connect to the two ends of the pipe, and anything the main process writes to it ends up in your python input stream. – MvG Aug 10 '12 at 20:51
  • @MvG - I don't control the main process so I don't think I can use named pipe. – Soumya Simanta Aug 10 '12 at 20:53
  • @David Robinson - Library that does one or more things I that I want. – Soumya Simanta Aug 10 '12 at 20:54

3 Answers3

20

C programs usually seek to the current position to clear any “end of file” flags. But as @9000 correctly pointed out, python apparently takes care of this, so you can read from the same file repeatedly even if it has reached end of file.

You might have to take care of incomplete lines, though. If your application writes its log in pieces, then you want to make sure that you handle whole lines, and not those pieces. The following code will accomplish that:

f = open('some.log', 'r')
while True:
    line = ''
    while len(line) == 0 or line[-1] != '\n':
        tail = f.readline()
        if tail == '':
            time.sleep(0.1)          # avoid busy waiting
            # f.seek(0, io.SEEK_CUR) # appears to be unneccessary
            continue
        line += tail
    process(line)
Community
  • 1
  • 1
MvG
  • 57,380
  • 22
  • 148
  • 276
  • Looks the program won't exit even if `some.log` finish writing. – atline Jun 03 '21 at 03:37
  • 1
    @atline: yes, the code is running an infinite loop. There is no reasonable way to detect that the log writer is done, although in practice it might write a specific log line last, and your could wait for that to occur and break the loop. – MvG Jun 03 '21 at 05:08
13

No need to run tail -f. Plain Python files should work:

with open('/tmp/track-this') as f:
  while True:
    line = f.readline()
    if line:
      print line

This thing works almost exactly like tail -f. Check it by running in another terminal:

echo "more" >> /tmp/track-this
# alt-tab here to the terminal with Python and see 'more' printed
echo "even more" >> /tmp/track-this

Don't forget to create /tmp/track-this before you run the Python snippet.

Parsing and taking appropriate actions are up to you. Probably long actions should be taken in separate threads/processes.

Stop condition is also up to you, but plain ^C works.

9000
  • 39,899
  • 9
  • 66
  • 104
  • 1
    Hi @9000 , In my case I'm parsing the "access.log" with the given code, but when the java process moves the current `access.log` to `access.log.timestamp` and creates a new `access.log` the given python tailor is not tailing the new access.log Can you help me how to overcome this ? – dm90 Mar 31 '16 at 07:39
  • The tailing process has to reopen the file when the log-writing process reopens the file. Sometimes a log-rotating process notifies the log-writing process to switch the logs (see `logrotate`); if so, it could notify the log-tailing process at the same moment. If the log-writing process switches the logs all by itself, there seem to be no easy signal to receive. Either the log-writing process should notify you, or you should periodically remember the file ctime and read position, close the file, look for a new one, and either switch to the new or reopen and `seek` to the previous position. – 9000 Mar 31 '16 at 15:11
2

Thanks everyone for the answers. I found this as well. http://www.dabeaz.com/generators/follow.py

Soumya Simanta
  • 11,523
  • 24
  • 106
  • 161
  • That code is still susceptible to lines being split if the process writes them in several chunks. So you might want to combine the answers. The generator-style aproach certainly looks nice. – MvG Aug 11 '12 at 12:02