0

I used tornado to construct my web service, and i use logging to get the logger of tornado, and everything seems successful. But as the service is multiprocessing, so today, when i check the log, i found some of the information lost. So i want to ask, if i open different log for different process, can this problem be solved?

or if any other solution can be applied into the multi-processes server when output the log.

Kwoks
  • 33
  • 5
  • Possible duplicate of [How should I log while using multiprocessing in Python?](https://stackoverflow.com/questions/641420/how-should-i-log-while-using-multiprocessing-in-python) – shmee Oct 16 '18 at 09:15

1 Answers1

0

Here is the code I use in my solution:

https://gist.github.com/hcl14/259432dd648180bf2af672c26d9df9fc

It runs tornado servers with flask apps concurrently, does logging on screen, into common log (stdout.log) and process-specific logs. Just see comments in the code, how everything should be imported and organized.

To run everything, put all three files together, create logs subfolder, the run mainprogram.py and test sending different POST requests. Then you can check files in log folder to ensure that everything logged correctly.

mainprogram.py is a main file, where logger is initialized and put into global_vars module. This module must be imported everywhere, and process-specific logger derived from there. The reason is that this is just a convenient way to store variables in global scope across modules. When new process if forked, it overwrites the logger there with process-specific logger, so all the modules used by process will write into the corresponding logger:

separate_logging.py (renamed on gist into mainprogram.py for convenience):

import logging
import os
import multiprocessing

from tornado.wsgi import WSGIContainer
from tornado.httpserver import HTTPServer
from tornado.ioloop import IOLoop
from tornado.options import define, options

# a way to pass variables into separate modules, 
# process-specific because of 'fork' mode
# (each process will have its own version of this module)
import global_vars



logPath = os.environ.get('LOGFOLDER','logs')
fileName = os.environ.get('LOGFILE', "stdout.log")

address = os.environ.get('ADDRESS','0.0.0.0')
port = os.environ.get('PORT','8888')

NUM_PROCESSES = os.cpu_count()

# initializes the main logger to be used across all modules

logFormatter = logging.Formatter("%(asctime)s [%(processName)-12.12s] [%(threadName)-12.12s] [%(levelname)-5.5s] [%(filename)s:%(lineno)d] %(message)s")
rootLogger = logging.getLogger(__name__)

# first handler is general log
fileHandler = logging.FileHandler("{0}/{1}".format(logPath, fileName))
fileHandler.setFormatter(logFormatter)
rootLogger.addHandler(fileHandler)
# second handler is logging to console
consoleHandler = logging.StreamHandler()
consoleHandler.setFormatter(logFormatter)
rootLogger.addHandler(consoleHandler) 

rootLogger.setLevel("DEBUG") # log everything
rootLogger.propagate = False

# until process branches, it uses rootLogger
global_vars.multiprocess_globals["logger"] = rootLogger



# third handler is process-specific log, initialized in processes
def init_logger2(secondary_logfile, rootLogger):
    fileHandler1 = logging.FileHandler("{0}/{1}".format(logPath, 'process_'+str(secondary_logfile)+'.log'))
    fileHandler1.setFormatter(logFormatter)
    rootLogger.addHandler(fileHandler1)

    return rootLogger



# external modules import goes here!
# otherwise they will not find any logger!
from flask_app import create_app




# ---------------

# process function
def run(process_id):

    # initialize process-specific logger
    processLogger = init_logger2(process_id, rootLogger)
    global_vars.multiprocess_globals["logger"] = processLogger

    # here you can run tornado app:

    try:
        app = create_app()  # pass interests to flask app        
        ioloop = IOLoop()
        http_server_api = HTTPServer(WSGIContainer(app))

        # reuse_port allows multiple servers co-exist
        # as separate processes
        http_server_api.bind(address=address, port=port, reuse_port=True) 
        http_server_api.start()

        processLogger.info("Process %s started %s:%s" % (process_id, address,
                                            port))

        ioloop.start()
    except Exception as e:
        processLogger.error(e)





# start processes (tornado servers)
if __name__ == '__main__':

    processes = []
    for i in range(1,NUM_PROCESSES):
        p = multiprocessing.Process(target=run, args=(str(i),))
        p.daemon = False # if we want to spawn child processes
        #p.daemon = True # if we want to gracefully stop program
        processes.append(p)


    # Run processes:

    for p in processes:
        p.start()

    # block program from exiting
    for p in processes:
        p.join() 

global_vars.py:

# global variables to be used across processes.
# Separate file is needed to make globals accessible from different submodules.

# Global variables which need to exist in the scope of each process.
# Variables are added into this dictionary during process initialization.
multiprocess_globals = {}

flask_app.py: Just an app in flask, which migh use deeper modules. In this case, those modules should also import logger as flask_app does:

# create flask app to be run by tornado process

# process-specific globals
import global_vars 

from flask import Flask, request, Response, json, abort, jsonify
import json as json2

# from deeper_module import do_something

app = Flask(__name__)
app.config.from_object(__name__)

# get process-specific logger
# do such import in any submodule !
# as global_vars is changed on process fork!
rootLogger = global_vars.multiprocess_globals["logger"]
logger = rootLogger.getChild(__name__)


def create_app():

    app = Flask(__name__)

    @app.route('/my_url1', methods=['POST'])
    def my_url1():

        body = json.loads(request.data)

        # debug to process-specific logger
        logger.debug(body)

        # do_something()

        response = app.response_class(
            response=json.dumps({'response':'good'}),
            status=200,
            mimetype='application/json'
        )
        return response

    # another functions

    return app

All modules which use specific logger must be imported after global_vars initialization in main program. After starting the program, you will see:

2018-10-16 12:50:20,988 [Process-1   ] [MainThread  ] [INFO ] [separate_logging.py:86] Process 1 started 0.0.0.0:8888
2018-10-16 12:50:20,989 [Process-2   ] [MainThread  ] [INFO ] [separate_logging.py:86] Process 2 started 0.0.0.0:8888
2018-10-16 12:50:20,990 [Process-3   ] [MainThread  ] [INFO ] [separate_logging.py:86] Process 3 started 0.0.0.0:8888
2018-10-16 12:50:20,991 [Process-4   ] [MainThread  ] [INFO ] [separate_logging.py:86] Process 4 started 0.0.0.0:8888
2018-10-16 12:50:20,991 [Process-6   ] [MainThread  ] [INFO ] [separate_logging.py:86] Process 6 started 0.0.0.0:8888
2018-10-16 12:50:20,992 [Process-7   ] [MainThread  ] [INFO ] [separate_logging.py:86] Process 7 started 0.0.0.0:8888
2018-10-16 12:50:20,993 [Process-5   ] [MainThread  ] [INFO ] [separate_logging.py:86] Process 5 started 0.0.0.0:8888

Then, you can fire POST request, which will be processed by one of the started processes:

$ curl -H "Content-Type: application/json" -X POST -d '{"bla-bla":"bla"}' 127.0.0.1:8888/my_url1

The result is:

2018-10-16 12:51:40,040 [Process-5 ] [MainThread ] [DEBUG] [flask_app.py:31] {'bla-bla': 'bla'}

You can check that this line will appear in general log logs/stdout.log and in logs/process_5.log as well.

Slowpoke
  • 1,069
  • 1
  • 13
  • 37
  • Great solution. But in my example, i use the `tornado.httpserver.HTTPServer` and the `server.start(0)` to run the server with multiprocess, and the project is already online, so i want to change it with least cost. If i run the process one by one, i think that each process own the unique log file can also figure out the problem. But anyway, your solution instructs me a lot. – Kwoks Oct 22 '18 at 03:41