0

I have a log file and I want to check, every n seconds (or when it has been modified), if there are new data which have been appended to it. I check the number of lines, if the previous count < the new, I start to parse the new data at the index = the previous line count.

def parse_file(path, index_to_start_parsing):
    //Parse the file

file_path = r"my_log_path"
previous_lines_count = 0
check_seconds = 5

while True:
    time.sleep(check_seconds)
    with open(file_path) as f:
        current_lines_count = sum(1 for _ in f)

    if current_lines_count > lines_count:
        data = parse_file(file_path, previous_lines_count)
        previous_lines_count = current_lines_count

It works, but I'm looking for a more optimized method. How can I check if the file has changed (I read about watchdog) and how can I parse only the new data appended to the file with a more efficient way.

EDIT:

I use os.stat('somefile.txt').st_size to check if the file changed.

1 Answers1

0

Use f.tell() to discover the current offset of the file pointer when you're done reading in your initial pass. On future passes, use f.seek() to advance the file pointer to that previous point in the file, and continue reading as normal.

offset = 0
while True:
    f = open(file)
    f.seek(offset)

    while line = f.readline():
        # process...  
        pass

    offset = f.tell()
    f.close()
    # wait until next iteration

This is an old C trick.

More info here.

Checking if the file has been modified is usually as easy as looking at the file size and seeing if it's the same as the last time you looked.

PaulProgrammer
  • 16,175
  • 4
  • 39
  • 56