4

This question is related to Python Multiprocessing. I am asking for a suitable interprocess communication data-structure for my specific scenario:

My scenario

I have one producer and one consumer.

  1. The producer produces a single fairly small panda Dataframe every 10-ish secs, then the producer puts it on a python.multiprocess.queue.
  2. The consumer is a GUI polling that python.multiprocess.queue every 100ms. It is VERY CRITICAL that the consumer catches every single DataFrame the producer produces.

My thinking

python.multiprocess.queue is serving the purpose (I think), and amazingly simple to use! (praise the green slithereen lord!). But I am clearly not utilizing queue's full potential with only one producer one consumer and a max of one item on the queue. That leads me to believe that there is simpler thing than queue. I tried to search for it, I got overwhelmed by options listed in: python 3.5 documentation: 18. Interprocess Communication and Networking. I am also suspecting there may be a way not involving interprocess communication data-structure at all for my need.

Please Note

  1. Performance is not very important
  2. I will stick with multiprocessing for now, instead of multithreading.

My Question

Should I be content with queue? or is there a more recommended way? I am not a professional programmer, so I insist on doing things the tried and tested way. I also welcome any suggestions of alternative ways of approaching my problem.

Thanks

Community
  • 1
  • 1
eliu
  • 2,390
  • 1
  • 17
  • 29
  • " python.multiprocess.queue is serving the purpose well, and amazingly simple to use!" - "Performance is not very important". It seems you already are pretty happy with it – Pedru Apr 20 '16 at 15:25
  • so there is a "better performance" way than queue? I was happy with walking too, until I learnt that you could run, even fly. – eliu Apr 20 '16 at 15:28
  • This type of question might get better response from [programmers.se]. – Kateract Apr 20 '16 at 15:37
  • @Kateract when referring other sites, it is often helpful to point that [cross-posting is frowned upon](http://meta.stackexchange.com/tags/cross-posting/info) – gnat Apr 20 '16 at 15:39
  • @gnat Thanks for the catch. – Kateract Apr 20 '16 at 15:43
  • I see, thanks for the info. But my question should be a simple one, no way it is conceptual. I just hoped someone with good knowledge of multiprogramming would come in and say "yep, queue is good, but look at thislink too if you want to go at it" or "no queue is too big for this, try thatlink". Again I am for trying different approaches, but I want to know the most recommended approaches. – eliu Apr 20 '16 at 15:44
  • @Robert Harvey thank you for the edit, I found
    quite awkward as well.
    – eliu Apr 20 '16 at 16:07
  • why not zeromq: http://stackoverflow.com/questions/6920858/interprocess-communication-in-python – DevLounge Apr 20 '16 at 19:33
  • @ Apero : you think too high of me, I haven't figured out how to do multiprocessing with python existing tools yet. My biggest problem is that I don't know all available tools in python for multiprocessing. I learnt queue because the GUI tutorials I followed used it. – eliu Apr 20 '16 at 19:41
  • 1
    also have a look to concurrent.futures.ThreadPoolExecutor and ProcessPoolExecutor – DevLounge Apr 20 '16 at 19:43
  • @ Apero : Thanks, those should keep me busy for a while. Such... exotic names on those packages – eliu Apr 20 '16 at 19:51

1 Answers1

3

To me, the most important thing you mentioned is this:

It is VERY CRITICAL that the consumer catches every single DataFrame the producer produces.

So, let's suppose you used a variable to store the DataFrame. The producer would set it to the produced value, and the consumer would just read it. That would work very fine, I guess.

But what would happen if somehow the consumer got blocked by more than one producing cycle? Then some old value would be overwritten before reading. And that's why I think a (thread-safe) queue is the way to go almost "by definition".

Besides, beware of premature optimization. If it works for your case, excellent. If some day, for some other case, performance comes to be a problem, only then you should spend the extra work, IMO.

heltonbiker
  • 26,657
  • 28
  • 137
  • 252
  • `a (thread-safe) queue is the way to go almost "by definition".` <-- thanks, I was hoping that was the answer as well. Just one follow up question: Where is, and who is in charge of this allpowerful queue? When I run my code, I see two "python.exe"s in my task manager, one for the mainline gui consumer, another is the producer. I would imagine that a queue should be a fairly complicated process itself that is independent from the consumer and producer. – eliu Apr 20 '16 at 20:23
  • 1
    Well, my answer was more related to multi_threading_, but you said you were using multi_processing_, so I guess that has something to do with the extra process. And, with respect to partitioning, as far as I know, dividing your work between threads running in the same _process_ is more lightweight than having more than one process, communicating with one another view inter-process communication. And about "who is in charge", the answer is: whatever way the module implementers chose to encapsulate in the `python.multiprocess.queue` class - and you can always take a look at the source, I guess. – heltonbiker Apr 21 '16 at 14:56
  • Regarding last comment, mind that the `multiprocess.queue` is a specialized queue, not a regular one, although they implement the same behaviour (but with a different implementation). – heltonbiker Apr 21 '16 at 14:57