Building a parallel Python Logreader

Question

What I am trying do is build a logreader in python.

This logreader should work parallel to the application which writes logs into a logfile. As soon as a new line of logs appears, it should be printed out by the logreader.

How they read the logs and what the application does is not necessary relevant for this topic. This topic is about how to call this logreader that it will print out all the new logs as soon as they appear. All while running the application which writes the logs.

Setting:

Python file "a_main_orchestrator.py": Orchestrates file and function calling of "b_logwriter.py" and "c_logreader.py"
Python file "b_logwriter.py": Writes logs into a logfile "logfile.log" - one every other second
Python file "c_logreader.py": Should run parallel to "b_logwriter.py". If a new line in logfile "logfile.log" appears it should print the new line into the standard output with "print".

Python file "a_main_orchestrator.py" will call Python file "b_logwriter.py", which writes logs and should also call Python file "c_logreader.py" to read the logs parallel and print it out.

Example:

This example is using multiprocessing. All the parts and functions do work. But only one after the another. Which means it will write all the logs, wait till it finishes and then start reading logs - no parallelism.

"a_main_orchestrator.py":

import time
from multiprocessing import Process

if __name__ == '__main__':
    from b_logwriter import logwriting
    from c_logreader import logreading

    file = open("logfile.log", "w")
    time.sleep(1)

    p1 = Process(target=logwriting(file))
    p1.start()
    p2 = Process(target=logreading)
    p2.start()
    p1.join()
    p2.join()

"b_logwriter.py":

import time

def logwriting(file):
    print("[LOGWRITER]: wait three seconds before start")
    time.sleep(3)

    file.write("first new line")
    file.write("\n")
    print("[LOGWRITER]: first line printed")

    print("[LOGWRITER]: wait three seconds again")
    time.sleep(3)

    file.write("second new line here")
    file.write("\n")
    print("[LOGWRITER]: second line printed")

    print("[LOGWRITER]: wait three seconds again again")
    time.sleep(3)

    file.write("thid new line is also here")
    file.write("\n")
    print("[LOGWRITER]: third line printed")

    time.sleep(3)
    file.write("ENDLOG")
    print("[LOGWRITER]: ENDLOG printed")

    file.close()

"c_logreader.py":

import time
import sys
import os
import os.path

def follow(thefile):
    thefile.seek(0,2)
    while True:
        line = thefile.readline()
        if not line:
            time.sleep(0.1)
            continue
        yield line

def logreading():
    try:
        print ("[LOGREADER]: Start reading")
        logfile = open("logfile.log","r")
        loglines = follow(logfile)
        for line in loglines:
            print "[LOGPRINT]: " + line.rstrip()
            sys.stdout.flush()
            if "ENDLOG" in line:
                print "[LOGREADER]: log reading finished - exiting Python Logreader"
                sys.exit()
    except IOError as e:
        print "I/O error({0}): {1}".format(e.errno, e.strerror)
        time.sleep(1)
        sys.stdout.flush()

Hurdles:

No real parallelism: With multithreading the logreader starts after the logwriter completed its work.This is a problem, because logs should be printed out as soon as theyappear in the file. I want "livelogs". Maybe I just used it wrong.
"Losing logs": Using CMD to create a subprocess will "lose" the logs, because printing them out will not appear in the standard output because it is only a subprocess.

Sorry for the long text and thanks for all the kind readings and hints! :-)

You can open the file in "r+" mode and share it to the two functions … — Laurent LAPORTE, Aug 18 '16 at 11:52
Hi Laurent - thanks for the input. Can you give a little more details on a) what parts of the logging modul to use to solve this problem of printing logs out parallel and b) how to use the "r+" for this. Using it with the given multiprocessing calls will not help, because it will call the files one after another - but maybe I just understood you wrong. Thanks! — Mydisname, Aug 18 '16 at 12:30
I solved the problem. This code works about great. What I forgot was that python has a build in bufferer for writing to files. Which means it does not necessarily write each "file.write()"-command directly to a file. However, if it is necessary that each "file.write()"-command will be directly written to the file, one should use "file.flush()" directly afterwards. This will flush the buffer and write everything saved in there to the file. Et voilá - it works. — Mydisname, Aug 22 '16 at 12:24
Rotating logs end up getting the file renamed, and then this code will wait forever on the old logfile. I had this issue and solved it this way, which also solves the `time.sleep(0.1)` issue https://stackoverflow.com/a/44411621/277267 — Daniel F, Jun 07 '17 at 11:44

Building a parallel Python Logreader

0 Answers0