3

I want to know the best practices followed to share a queue (resource) between two processes in Python. Here is a what each process is doing:

Process_1: continuously gets data (in json format) from a streaming api

Process_2: is a daemon (similar to Sander Marechal's code) which commits data (one at a time) into a database

So, Process_1 (or Producer) puts a unit of data onto this shared resource, and Process_2 (or Consumer) will poll this shared resource for any new units of data, and store them in a DB, if any.

There are some options which came to my mind:

  • Using pickle (drawback: extra overhead of pickling and de-pickling)
  • Passing data via stdout of Process_1 to stdin of Process_2 (drawback: none, but not sure how to implement this with a daemon)
  • Using the pool object in the multiprocessing library (drawback: not sure how to code this as one process is a daemon)

I would like an optimal solution practiced in this regard, with some code :). Thanks.

ajmartin
  • 2,379
  • 2
  • 26
  • 42

1 Answers1

6

multiprocessing.pool isn't what you want in this case - it is useful for having multiple units of work done 'in the background' (concurrently), not so much for managing a shared resource. Since you appear to have the format of the communications worked out, and they communicate in only one direction, what you want is a multiprocessing.Queue - the documentation has a good example of how to use it - you will want your Process_1 putting data into the Queue as needed, and Process_2 calling q.get() in an infinite loop. This will cause the Consumer to block when there is nothing to do, rather than busy-waiting as you suggest (which can waste processor cycles). The issue that this leaves is closing the daemon - possibly the best way is to have the Producer put a sentinel value at the end of the queue, to ensure that the Consumer deals with all requests. Other alternatives include things like trying to forcibly kill the process when the child exits, but this is error-prone.

Note that this assumes that the Producer spawns the Consumer (or vice versa) - if the Consumer is a long-running daemon that can deal with multiple relatively short-lived Producers, the situation becomes quite a bit harder - there isn't, to my knowledge, any cross-platform high-level IPC module; the most portable (and generally easiest) way to handle this may be to use the filesystem as a queue - have a spool folder somewhere that the Producers write a text file to for each request; the Consumer can then process these at its leisure - however, this isn't without its own issues: you would need to ensure that the Consumer doesn't try to open a half-written instruction file, that the Producers aren't stepping on each other's toes, and that the Producers and the Consumer agree on the ordering of requests.

lvc
  • 34,233
  • 10
  • 73
  • 98
  • lvc: Thanks for your suggestion. None of the processes are spawning the other. They are created independently. – ajmartin Jun 02 '11 at 08:43