0

I'm trying to run Tornado REST that returns data from a very large DataFrame.

Because the data changes frequently, I'd like to update the DataFrame frequently in the backgroundwithout blocking the REST responsiveness.

This code blocks every time update_df is called, and REST requests time out (or simply wait 60 seconds - the time of the DF update.)

I have tried running df_update as a multiprocessing and making a Singleton to store the DF (that main uses) and both instantiating the singleton object in the thread AND passing the actual singleton object into the update_df thread - but the df within the Tornado classes (I.e. DFHandler) never gets re-updated with the new data.

I have tried multiprocessing solutions from these posts. Some worked to update the df variable, but always blocked the tornado responsiveness. Others simply didn't work with errors or failed to update the variable.

I've tried running the Tornado server in a multiprocessing thread, and a while loop in main loading the updated DF and then ending and restarting the Tornado thread after the DF is updated - but every attempt at that ended with a socket open error when the tornado server thread was started again.

I've tried asyncio, but it fails because tornado itself is using asyncio events, so it fails with the error "asyncio.run() cannot be called from a running event loop"

Any suggestions?

import json
import tornado.web
import tornado.httpserver

df = <preload_inital_df> #very slow (60 seconds)

def update_df():
    df = <RE_load_changed_df> #very slow (60 seconds)

class RootPageHandler(tornado.web.RequestHandler):
    def initialize(self, *args, **kwargs):
        pass
    def get(self):
        self.write("Hello World")

class DFHandler(tornado.web.RequestHandler):
        def initialize(self, df):
        self.df = df
    def get(self):
        self.write(df.to_json())

class Application(tornado.web.Application):
    def __init__(self):
        handlers = [
            ('/',             RootPageHandler, {}),
            ('/df',           DFHandler,       {'df': df}),
            ]

        settings = dict(template_path='/templates',
                        static_path='/static', 
                        debug=True)

        tornado.web.Application.__init__(self, handlers, **settings)

# Run the instance
application = Application()
http_server = tornado.httpserver.HTTPServer(application)
http_server.listen(8888)

# Callback function to update configs
some_time_period = 600000 # Once every 100 second
tornado.ioloop.PeriodicCallback(update_df, some_time_period).start()
tornado.ioloop.IOLoop.instance().start()
RightmireM
  • 2,381
  • 2
  • 24
  • 42
  • The code above you showed,not use 'asyncio', it will block.Try to use 'asyncio'. Then,try to use the function 'run_in_executor' to use thread or process .Also,if it is a big data DataFrame,will the 'to_json' take a long time?it is a block function. – lanhao945 Feb 24 '23 at 02:11
  • Hi @lanhao945. I did try using asyncio, but it fails because Tornado itself uses asyncio and causes it to raise an `asyncio.run() cannot be called from a running event loop` error – RightmireM Feb 24 '23 at 13:43
  • Show mini codes about the way you use asyncio. asyncio has apis,which no need to start or run again in a running event loop.May you tried a wrong api. – lanhao945 Feb 27 '23 at 00:51

0 Answers0