I'm trying to run Tornado REST that returns data from a very large DataFrame.
Because the data changes frequently, I'd like to update the DataFrame frequently in the backgroundwithout blocking the REST responsiveness.
This code blocks every time update_df is called, and REST requests time out (or simply wait 60 seconds - the time of the DF update.)
I have tried running df_update as a multiprocessing
and making a Singleton to store the DF (that main uses) and both instantiating the singleton object in the thread AND passing the actual singleton object into the update_df thread - but the df within the Tornado classes (I.e. DFHandler) never gets re-updated with the new data.
I have tried multiprocessing
solutions from these posts. Some worked to update the df variable, but always blocked the tornado responsiveness. Others simply didn't work with errors or failed to update the variable.
- Make Singleton class in Multiprocessing
- Share list between process in python server
- Shared variable in python's multiprocessing
- Python multiprocessing without blocking parent process
- How do I stop Tornado web server?
I've tried running the Tornado server in a multiprocessing thread, and a while loop in main loading the updated DF and then ending and restarting the Tornado thread after the DF is updated - but every attempt at that ended with a socket open error when the tornado server thread was started again.
- how to to terminate process using python's multiprocessing
- How do I close all sockets opened by Tornado?
I've tried asyncio, but it fails because tornado itself is using asyncio events, so it fails with the error "asyncio.run() cannot be called from a running event loop"
- Non-blocking I/O with asyncio
- "AttributeError: module 'asyncio' has no attribute 'coroutine'." in Python 3.11.0
Any suggestions?
import json
import tornado.web
import tornado.httpserver
df = <preload_inital_df> #very slow (60 seconds)
def update_df():
df = <RE_load_changed_df> #very slow (60 seconds)
class RootPageHandler(tornado.web.RequestHandler):
def initialize(self, *args, **kwargs):
pass
def get(self):
self.write("Hello World")
class DFHandler(tornado.web.RequestHandler):
def initialize(self, df):
self.df = df
def get(self):
self.write(df.to_json())
class Application(tornado.web.Application):
def __init__(self):
handlers = [
('/', RootPageHandler, {}),
('/df', DFHandler, {'df': df}),
]
settings = dict(template_path='/templates',
static_path='/static',
debug=True)
tornado.web.Application.__init__(self, handlers, **settings)
# Run the instance
application = Application()
http_server = tornado.httpserver.HTTPServer(application)
http_server.listen(8888)
# Callback function to update configs
some_time_period = 600000 # Once every 100 second
tornado.ioloop.PeriodicCallback(update_df, some_time_period).start()
tornado.ioloop.IOLoop.instance().start()