Python zeromq pipeline pattern

Question

I am trying to implement a pipelining pattern with Python and ZeroMQ therefore I created the three components: producer, consumer and merger

So, I am able to send my tasks, consume (calculate square of a number), and then get my results.

My question is, if I have a list of tasks I plan to submit as a batch and I want them to be completed in a distributed pool of resources and merge the results, similar to multiprocessing apply/apply_async.

In native python, I typically use

    ...

    def f(n):
      return n*n

    ...
    _r=[]
    for i in xrange(1000): 
      _r.append(pool.apply_async(f,(i,)))
    ...
    #get results
    results = [ r.get() for r in _r ]

I am trying to emulate this into zeromq pattern

I am basing my producer, consumer, results with this example. I also plan to have several servers as consumers since its a long running task therefore I have looked at the load-balancing pattern and the daunting cluster pattern. Is this necessary for zero mq?

score 0 · Answer 1 · edited May 23 '17 at 10:26

^{... original post has changed both the wording of the question ( excluding inital request for a PARALLEL processing ) and has shifted the target.

This answer started before these updates
when
the O/P contained
a question

. . . how do I know my first batch results are ready?}

How do I know my first ... is the easier part:

The result_collector end of the processing pipeline can either .poll() it's PULL receiver associated with a zmq.Poller instance, so as to test for a presence of a first appearance of zmq.POLLIN event.

Or, may get hands a bit dirty and handle the same PULL-engine "sniffing" by a construct alike
try: .recv( ..., zmq.NOBLOCK ) except ZMQError: #noMsgYetHandler

Parallel? In case you require indeed a PARALLEL ... it is VERY HARD

In recent decade, many texts use a word parallel. Still, there is a very limited number of cases, where a true PARALLEL code execution is possible.

This post may help for a fast disambiguation and another link there will help strictly separate a case, on what conditions a true PARALLEL software system design has to be implemented.

In case your Project fits into [ 1.Definition ] of a true PARALLEL system, your ZeroMQ efforts will also have to provide a robust, minimum latency signalling and ultra-low latency control-plane facilities for tru-parallel-system execution ( which is out-of-question doable, however your Project's man*days consumption & know-how + QM/QA requirements have exploded into several orders of magnitude larger investment / V@R scale )

CONCURRENT solution fits just-right?

If CONCURRENT system operations design is enough ( be it a blocking-permitted or a high-level top-down non-blocking system architecture design ), you will have many direct ZeroMQ tools for designing smart services, round-robin or priority-queue-flagged ( so thus btw inherently non-parallel, just concurrent ) load-balancers and many handy ready-to-use features under this "relaxed" system design paradigm.

Python zeromq pipeline pattern

1 Answers1

How do I know my first ... is the easier part:

Parallel? In case you require indeed a PARALLEL ... it is VERY HARD

CONCURRENT solution fits just-right?