0

Edit for clarify my question: I want to attach a python service on uwsgi using this feature (I can't understand the examples) and I also want to be able to communicate results between them. Below I present some context and also present my first thought on the communication matter, expecting maybe some advice or another approach to take.


I have an already developed python application that uses multiprocessing.Pool to run on demand tasks. The main reason for using the pool of workers is that I need to share several objects between them.

On top of that, I want to have a flask application that triggers tasks from its endpoints.

I've read several questions here on SO looking for possible drawbacks of using flask with python's multiprocessing module. I'm still a bit confused but this answer summarizes well both the downsides of starting a multiprocessing.Pool directly from flask and what my options are.

This answer shows an uWSGI feature to manage daemon/services. I want to follow this approach so I can use my already developed python application as a service of the flask app.

One of my main problems is that I look at the examples and do not know what I need to do next. In other words, how would I start the python app from there?

Another problem is about the communication between the flask app and the daemon process/service. My first thought is to use flask-socketIO to communicate, but then, if my server stops I need to deal with the connection... Is this a good way to communicate between server and service? What are other possible solutions?


Note: I'm well aware of Celery, and I pretend to use it in a near future. In fact, I have an already developed node.js app, on which users perform actions that should trigger specific tasks from the (also) already developed python application. The thing is, I need a production-ready version as soon as possible, and instead of modifying the python application, that uses multiprocessing, I thought it would be faster to create a simple flask server to communicate with node.js through HTTP. This way I would only need to implement a flask app that instantiates the python app.


Edit:

Why do I need to share objects?

Simply because the creation of the objects in questions takes too long. Actually, the creation takes an acceptable amount of time if done once, but, since I'm expecting (maybe) hundreds to thousands of requests simultaneously having to load every object again would be something I want to avoid.

One of the objects is a scikit classifier model, persisted on a pickle file, which takes 3 seconds to load. Each user can create several "job spots" each one will take over 2k documents to be classified, each document will be uploaded on an unknown point in time, so I need to have this model loaded in memory (loading it again for every task is not acceptable).

This is one example of a single task.


Edit 2: I've asked some questions related to this project before:

As stated, but to clarify: I think the best solution would be to use Celery, but in order to quickly have a production ready solution, I trying to use this uWSGI attach daemon solution

leoschet
  • 1,697
  • 17
  • 33
  • You forgot to tell us what the actual problem is you are trying to solve. – Klaus D. Jun 27 '18 at 23:52
  • I actually did, it is more focused over the 2nd to the 6th paragraph. I presented a full view of the problem on the "Note" section. As stated, I want to build a flask application on top of my python's multiprocessing program (2nd paragraph), the possible approaches I found are on the provided SO link (3rd paragraph). I specified the approach I want to follow (4th paragraph) and presented the main problems I'm having to continue development (5th and 6th paragraphs) – leoschet Jun 28 '18 at 00:14
  • In an explict manner, I want to **start a python daemon/service using [uWSGI feature](http://uwsgi-docs.readthedocs.io/en/latest/AttachingDaemons.html#managing-external-daemons-services)** as well as **communicate the results from the service and the flask application**. If that's not what you meant, please, elaborate – leoschet Jun 28 '18 at 00:18
  • Yes and no. What they are possibly getting at is that you have outlined a solution for some problem you haven't really explained in detail why you need it. You are then asking how to get your solution working. The reason for going more into the actual problem or reason why you need it, is that someone may suggest a completely different way of doing it which may be simpler or more appropriate. It is not uncommon to get this response as many questions on SO are like this. Help me with this specific solution, when there was often better solutions to begin with. – Graham Dumpleton Jun 28 '18 at 01:17
  • Please expand on "The main reason for using the pool of workers is that I need to share several objects between them." Worker are distinct processes. What are you sharing (and why)? – Dave W. Smith Jun 28 '18 at 01:52
  • @GrahamDumpleton i'm asking about how do I start a daemon process service with uWSGI, specifically using the feature on this [link](http://uwsgi-docs.readthedocs.io/en/latest/AttachingDaemons.html#managing-external-daemons-services) as said, I read the examples, but do not understand. I also ask about the communication between the daemon service and the uWSGI (flask) app. I provided what I think to be a solution for the latter and asked if there is a better way of doing. Please, read my question, on the note section, I said that in fact I have a node and a python application already – leoschet Jun 28 '18 at 02:31
  • already implemented and want to communicate between them, I said to believe that Celery would be the best approach but want to use flask + uWSGI + python daemon proces/service to run native multiprocessing module. – leoschet Jun 28 '18 at 02:33
  • @DaveW.Smith I edited the question with the why. – leoschet Jun 28 '18 at 02:35
  • @GrahamDumpleton, I added a link of a previous question on which I talk more about the project in question – leoschet Jun 28 '18 at 02:35
  • I really think i'm being overly redundant on my questions explaining what I really asked, but please, if you still need some details, ask me! I also want to thank you in advance! – leoschet Jun 28 '18 at 02:51
  • 1
    The 'why' you have added is great. Addressing that overall goal, have you ever looked at Dask (http://dask.pydata.org/en/latest/)? It is a way of distributing large data sets across processes and then being able to run jobs across it. It works with packages like numpy/pandas/scikit etc. In other words, one way of solving the issue is to use a mini dask cluster, even if multiple worker processes on same node, and then use dask client API to submit tasks to the workers. – Graham Dumpleton Jun 28 '18 at 03:37
  • Hmm, sounds interesting, it would solve the python side of the project, I was intending to only use a task queue after having the first version in production, the task queue service would substitute the HTTP communication, and for that I would need a language agnostic task queue like Celery, to directly connect node and python.. I didn't search on the web on how to share ojects with celery workers, but there should be a way – leoschet Jun 28 '18 at 12:13
  • @GrahamDumpleton so, there is no way to take advantage of my already coded python + multiprocessing (native module) solution? I mean, no matter the solution, I will need to redo the multiprocessing part? – leoschet Jun 28 '18 at 12:15

1 Answers1

1

I can see the temptation to hang on to multiprocessing.Pool. I'm using it in production as part of a pipeline. But Celery (which I'm also using in production) is much better suited to what you're trying to do, which is distribute work across cores to a resource that's expensive to set up. Have N cores? Start N celery workers, which of which can load (or maybe lazy-load) the expensive model as a global. A request comes in to the app, launch a task (e.g., task = predict.delay(args), wait for it to complete (e.g., result = task.get()) and return a response. You're trading a little bit of time learning celery for saving having to write a bunch of coordination code.

Dave W. Smith
  • 24,318
  • 4
  • 40
  • 46