1

I've got the following problem:

I have two different classes; let's call them the interface and worker. The interface is supposed to accept requests from outside, and multiplexes them to several workers.

Contrary to almost every example I have found, I have several peculiarities:

  • The workers are not supposed to be recreated for every request.
  • The workers are different; a request for workers[0] cannot be answered by workers[1]. This multiplexing is done in interface.
  • I have a number of function-like calls which are difficult to model via events or simple queues.
  • There are a few different requests, which would make one queue per request difficult.

For example, assume that each worker is storing a single integer number (let's say the number of calls this worker received). In non-parallel processing, I'd use something like this:

class interface(object):
    workers = None #set somewhere else.

    def get_worker_calls(self, worker_id):
        return self.workers[worker_id].get_calls()

class worker(object)
    calls = 0

    def get_calls(self):
        self.calls += 1
        return self.calls

 

This, obviously, doesn't work. What does?

Or, maybe more relevantly, I don't have experience with multiprocessing. Is there a design paradigm I'm missing that would easily solve the above?

Thanks!

 

 

For reference, I have considered several approaches, and I was unable to find a good one:

  • Use one request and answer queue. I've discarded this idea since that'd either block interface'for the answer-time of the current worker (making it badly scalable), or would require me sending around extra information.
  • Use of one request queue. Each message contains a pipe to return the answer to that request. After fixing the issue with being unable to send pipes via pipes, I've run into problems with pipe closing unless sending both ends over the connection.
  • Use of one request queue. Each message contains a queue to return the answer to that request. Fails since I cannot send queues via queues, but the reduction trick doesn't work.
  • The above also applies to the respective Manager-generated objects.
e14159
  • 13
  • 2

1 Answers1

0

Multiprocessing means you have 2+ separated processes running. There is no way to access memory from one process to another directly (as with multithreading).

Your best shot is to use some kind of external Queue mechanism, you can start with Celery or RQ. RQ is simpler but celery has built-in monitoring.

But you have to know that Multiprocessing will work only if Celery/RQ are able to "pack" the needed functions/classes and send them to other process. Therefore you have to use __main__ level functions (that are in top of file, not belongs to any class).

You can always implement it yourself, Redis is very simple, ZeroMQ and RabbitMQ are also good.

Beaver library is good example of how to deal with multiprocessing in python using ZeroMQ queue.

Dawid Gosławski
  • 2,028
  • 1
  • 18
  • 25