73

I'm currently writing a program in python on a Linux system. The objective is to read a log file and execute a bash command upon finding a particular string. The log file is being constantly written to by another program.

My question: If I open the file using the open() method will my Python file object be updated as the actual file gets written to by the other program or will I have to reopen the file at timed intervals?

UPDATE: Thanks for answers so far. I perhaps should have mentioned that the file is being written to by a Java EE app so I have no control over when data gets written to it. I've currently got a program that reopens the file every 10 seconds and tries to read from the byte position in the file that it last read up to. For the moment it just prints out the string that's returned. I was hoping that the file did not need to be reopened but the read command would somehow have access to the data written to the file by the Java app.

#!/usr/bin/python
import time

fileBytePos = 0
while True:
    inFile = open('./server.log','r')
    inFile.seek(fileBytePos)
    data = inFile.read()
    print data
    fileBytePos = inFile.tell()
    print fileBytePos
    inFile.close()
    time.sleep(10)

Thanks for the tips on pyinotify and generators. I'm going to have a look at these for a nicer solution.

Jeff Bauer
  • 13,890
  • 9
  • 51
  • 73
JimS
  • 1,123
  • 3
  • 14
  • 17

9 Answers9

125

I would recommend looking at David Beazley's Generator Tricks for Python, especially Part 5: Processing Infinite Data. It will handle the Python equivalent of a tail -f logfile command in real-time.

# follow.py
#
# Follow a file like tail -f.

import time
def follow(thefile):
    thefile.seek(0,2)
    while True:
        line = thefile.readline()
        if not line:
            time.sleep(0.1)
            continue
        yield line

if __name__ == '__main__':
    logfile = open("run/foo/access-log","r")
    loglines = follow(logfile)
    for line in loglines:
        print line,
Jeff Bauer
  • 13,890
  • 9
  • 51
  • 73
  • 2
    I would upvote if the answer contained a code example in terms of the OP's code. – chtenb May 01 '14 at 20:25
  • @Chiel92: added code sample from David Beazley's site – Jeff Bauer May 02 '14 at 14:07
  • 2
    This answer is imho wrong, if the writer write a line in two seperate chunks the readline will return twice. But you really only want to return a single line. – Fabian Jul 18 '14 at 13:25
  • Rotating logs end up getting the file renamed, and then this code will wait forever on the old logfile. I had this issue and solved it this way, which also solves the `time.sleep(0.1)` issue https://stackoverflow.com/a/44411621/277267 – Daniel F Jun 07 '17 at 11:47
  • does this work async? i mean, doesnt it block the process it runs from? – Uwe Pfeifer Aug 11 '18 at 18:16
  • I tried this implementation with asyncio and my generator was way too fast and got chunks 90% of the time rather than the whole line :( – Kruupös Oct 16 '18 at 14:37
  • How to use this same function in an HttpStreaminResponse, i want to display the same in the browser – Shihabudheen K M Oct 31 '18 at 09:36
  • 1
    what does `thefile.seek(0,2)` do? – Rylan Schaeffer Jun 18 '19 at 03:36
  • 1
    @RylanSchaeffer `0` is the offset, `2` means seek relative to the file's end. – Jeff Bauer Jun 18 '19 at 14:25
  • I do not get this program to pick up updates in the log file. For example, when run do the following test: 1. point python program to 'run/foo/access-log' at disk 2. start python program 3. open the file run/foo/access-log in editor and add new lines at the end for the file. Actual behavior: No new lines are printed out? Dose this mean the program is wrong or can I not test the program above this way? – Daniel Nelson Jan 10 '22 at 15:13
  • It worked as expected but if i want to clear my logs and try it out again it does not gives o/p it would give o/p only when number of line exceeds previous logs which i had cleared any solution to reset the yield line number back to one as i am going to flush logs daily – insoftservice Jun 18 '22 at 13:42
27

"An interactive session is worth 1000 words"

>>> f1 = open("bla.txt", "wt")
>>> f2 = open("bla.txt", "rt")
>>> f1.write("bleh")
>>> f2.read()
''
>>> f1.flush()
>>> f2.read()
'bleh'
>>> f1.write("blargh")
>>> f1.flush()
>>> f2.read()
'blargh'

In other words - yes, a single "open" will do.

Jeff Bauer
  • 13,890
  • 9
  • 51
  • 73
jsbueno
  • 99,910
  • 10
  • 151
  • 209
13

Here is a slightly modified version of Jeff Bauer answer which is resistant to file truncation. Very useful if your file is being processed by logrotate.

import os
import time

def follow(name):
    current = open(name, "r")
    curino = os.fstat(current.fileno()).st_ino
    while True:
        while True:
            line = current.readline()
            if not line:
                break
            yield line

        try:
            if os.stat(name).st_ino != curino:
                new = open(name, "r")
                current.close()
                current = new
                curino = os.fstat(current.fileno()).st_ino
                continue
        except IOError:
            pass
        time.sleep(1)


if __name__ == '__main__':
    fname = "test.log"
    for l in follow(fname):
        print "LINE: {}".format(l)
Community
  • 1
  • 1
3

Since you're targeting a Linux system, you can use pyinotify to notify you when the file changes.

There's also this trick, which may work fine for you. It uses file.seek to do what tail -f does.

nmichaels
  • 49,466
  • 12
  • 107
  • 135
  • Links are prone to breaking and there are no code examples. This answer doesn't provide much and the useful parts are volatile. – Zim Aug 18 '22 at 18:24
1

I am no expert here but I think you will have to use some kind of observer pattern to passively watch the file and then fire off an event that reopens the file when a change occurs. As for how to actually implement this, I have no idea.

I do not think that open() will open the file in realtime as you suggest.

Adam Pointer
  • 1,492
  • 8
  • 17
1

If you have the code reading the file running in a while loop:

f = open('/tmp/workfile', 'r')
while(1):
    line = f.readline()
    if line.find("ONE") != -1:
        print "Got it"

and you are writing to that same file ( in append mode ) from another program. As soon as "ONE" is appended in the file you will get the print. You can take whatever action you want to take. In short, you dont have to reopen the file at regular intervals.

>>> f = open('/tmp/workfile', 'a')
>>> f.write("One\n")
>>> f.close()
>>> f = open('/tmp/workfile', 'a')
>>> f.write("ONE\n")
>>> f.close()
w00t
  • 450
  • 6
  • 10
  • 1
    This answer is also wrong, the write could get split up into 'ON' and 'E\n' which would result in two line where neither matches. – Fabian Jul 18 '14 at 13:27
0

I have a similar use case, and I have written the following snippet for it. While some may argue that this is not the most ideal way to do it, this gets the job done and looks easy enough to understand.

def reading_log_files(filename):
    with open(filename, "r") as f:
        data = f.read().splitlines()
    return data


def log_generator(filename, period=1):
    data = reading_log_files(filename)
    while True:
        time.sleep(period)
        new_data = reading_log_files(filename)
        yield new_data[len(data):]
        data = new_data


if __name__ == '__main__':
    x = log_generator(</path/to/log/file.log>)
    for lines in x:
        print(lines)
        # lines will be a list of new lines added at the end

Hope you find this useful

noob_coder
  • 72
  • 3
  • 6
0

It depends on what exactly you want to do with the file. There are two potential use-cases with this:

  1. Reading appended contents from a continuously updated file such as a log file.
  2. Reading contents from a file which is overwritten continuously (such as the network statistics file in *nix systems)

As other people have elaborately answered on how to address scenario #1, I would like to help with those who need scenario #2. Basically you need to reset the file pointer to 0 using seek(0) (or whichever position you want to read from) before calling read() n+1th time.

Your code can look somewhat like the below function.

def generate_network_statistics(iface='wlan0'):
    with open('/sys/class/net/' + iface + '/statistics/' + 'rx' + '_bytes', 'r') as rx:
        with open('/sys/class/net/' + iface + '/statistics/' + 'tx' + '_bytes', 'r') as tx:
            with open('/proc/uptime', 'r') as uptime:
                while True:
                    receive = int(rx.read())
                    rx.seek(0)
                    transmit = int(tx.read())
                    tx.seek(0)
                    uptime_seconds = int(uptime.read())
                    uptime.seek(0)
                    print("Receive: %i, Transmit: %i" % (receive, transmit))
                    time.sleep(1)
Dheeraj Pb
  • 75
  • 6
0

Keep the file handle open even if an empty string is returned at the end of the file, and try again to read it after some sleep time.

    import time

    syslog = '/var/log/syslog'
    sleep_time_in_seconds = 1

    try:
        with open(syslog, 'r', errors='ignore') as f:
            while True:
                for line in f:
                    if line:
                        print(line.strip())
                        # do whatever you want to do on the line
                time.sleep(sleep_time_in_seconds)
    except IOError as e:
        print('Cannot open the file {}. Error: {}'.format(syslog, e))