I'm working with cherrypy in a server that implements a RESTful like API. The responses imply some heavy computation that takes about 2 seconds for request. To do this computations, some data is used that is updated three times a day.
The data is updated in the background (takes about half hour), and once it is updated, the references of the new data are passed to the functions that respond the requests. This takes just a milisecond.
What I need is to be sure that each request is answered either with the old data or with the new data, but none request processing can take place while the data references are being changed. Ideally, I would like to find a way of buffering incoming request while the data references are changed, and also to ensure that the references are changed after all in-process requests finished.
My current (not) working minimal example is as follows:
import time
import cherrypy
from cherrypy.process import plugins
theData = 0
def processData():
"""Backround task works for half hour three times a day,
and when finishes it publish it in the engine buffer."""
global theData # using global variables to simplify the example
theData += 1
cherrypy.engine.publish("doChangeData", theData)
class DataPublisher(object):
def __init__(self):
self.data = 'initData'
cherrypy.engine.subscribe('doChangeData', self.changeData)
def changeData(self, newData):
cherrypy.engine.log("Changing data, buffering should start!")
self.data = newData
time.sleep(1) #exageration of the 1 milisec of the references update to visualize the problem
cherrypy.engine.log("Continue serving buffered and new requests.")
@cherrypy.expose
def index(self):
result = "I get "+str(self.data)
cherrypy.engine.log(result)
time.sleep(3)
return result
if __name__ == '__main__':
conf = {
'/': { 'server.socket_host': '127.0.0.1',
'server.socket_port': 8080}
}
cherrypy.config.update(conf)
btask = plugins.BackgroundTask(5, processData) #5 secs for the example
btask.start()
cherrypy.quickstart(DataPublisher())
If I run this script, and also open a browser, put localhost:8080 and refresh the page a lot, I get:
...
[17/Sep/2015:21:32:41] ENGINE Changing data, buffering should start!
127.0.0.1 - - [17/Sep/2015:21:32:41] "GET / HTTP/1.1" 200 7 "...
[17/Sep/2015:21:32:42] ENGINE I get 3
[17/Sep/2015:21:32:42] ENGINE Continue serving buffered and new requests.
127.0.0.1 - - [17/Sep/2015:21:24:44] "GET / HTTP/1.1" 200 7 "...
...
Which means that some requests processing started before and ends after the data references start or end to being changed. I want to avoid both cases. Something like:
...
127.0.0.1 - - [17/Sep/2015:21:32:41] "GET / HTTP/1.1" 200 7 "...
[17/Sep/2015:21:32:41] ENGINE Changing data, buffering should start!
[17/Sep/2015:21:32:42] ENGINE Continue serving buffered and new requests.
[17/Sep/2015:21:32:42] ENGINE I get 3
127.0.0.1 - - [17/Sep/2015:21:24:44] "GET / HTTP/1.1" 200 7 "...
...
I searched documentation and the web and find these references that do not completely cover this case:
http://www.defuze.org/archives/198-managing-your-process-with-the-cherrypy-bus.html
How to execute asynchronous post-processing in CherryPy?
http://tools.cherrypy.org/wiki/BackgroundTaskQueue
Cherrypy : which solutions for pages with large processing time
How to stop request processing in Cherrypy?
Update (with a simple solution):
After giving more thought, I think that the question is misleading since it includes some implementation requirements in the question itself, namely: to stop processing and start buffering. While for the problem the requirement can be simplified to: be sure that each request is processed either with the old data or with the new data.
For the later, it is enough to store a temporal local reference of the used data. This reference can be used in all the request processing, and it will be no problem if another thread changes self.data
. For python objects, the garbage collector will take care of the old data.
Specifically, it is enough to change the index function by:
@cherrypy.expose
def index(self):
tempData = self.data
result = "I started with %s"%str(tempData)
time.sleep(3) # Heavy use of tempData
result += " that changed to %s"%str(self.data)
result += " but I am still using %s"%str(tempData)
cherrypy.engine.log(result)
return result
And as a result we will see:
[21/Sep/2015:10:06:00] ENGINE I started with 1 that changed to 2 but I am still using 1
I still want to keep the original (more restrictive) question and cyraxjoe answer too, since I find those solutions very useful.