32

What would be an inter-process communication (IPC) framework\technique with the following requirements:

  • Transfer native Python objects between two Python processes
  • Efficient in time and CPU (RAM efficiency irrelevant)
  • Cross-platform Win\Linux
  • Nice to have: works with PyPy

UPDATE 1: the processes are on the same host and use the same versions of Python and other modules

UPDATE 2: the processes are run independently by the user, no one of them spawns the others

Jonathan Livni
  • 101,334
  • 104
  • 266
  • 359

5 Answers5

21

Native objects don't get shared between processes (due to reference counting).

Instead, you can pickle them and share them using unix domain sockets, mmap, zeromq, or an intermediary such a sqlite3 that is designed for concurrent accesses.

Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
  • What do you think of XML-RPC? – Santa Oct 20 '11 at 18:05
  • 3
    I love XML-RPC but the OP's question focused on cpu efficiency so xml-rpc didn't make the cut. – Raymond Hettinger Oct 20 '11 at 18:40
  • 2
    pickling takes times and CPU but conserves RAM, my requirements are the exact opposite. Is there a way to communicate them without pickling them? – Jonathan Livni Oct 21 '11 at 15:16
  • 1
    Was looking for a simple example of use of `mmap` to share data between two independently ran scripts, and finally found one here: [Sharing Python data between processes using mmap | schmichael's blog](http://blog.schmichael.com/2011/05/15/sharing-python-data-between-processes-using-mmap/) - but it seems that you still have to open a file and store the data to be shared there; mmap (apparently) simply provides a special interface to access this file (I was otherwise hoping mmap could utilize memory directly, bypassing temp files) – sdaau Feb 04 '13 at 19:45
  • @sdaau About mmap being tied to temp files: not really. You can create what is called an anonymous mmap, that doesn't rely on files, but the shared area is only available for threads on the same process (of course), or to children processes forked after the mmap has been created, so it is not useful for the requirements here – Ricardo Cárdenes Oct 18 '13 at 10:58
  • @RicardoCárdenes, posix defines `shm_open` to create in memory file which can be used to bypassing the temp file, does python also have that? – doraemon Nov 14 '17 at 02:10
  • @LiuSha There was talk some years ago about adding `shm_open` functionality to the mmap module. The Windows version of `mmap.mmap` supports something similar using tagnames, but it looks like the Unix version never got the patch? At least the documentation does not suggest it. – Ricardo Cárdenes Nov 16 '17 at 10:40
  • @RicardoCárdenes, for linux, I found that `open` a file in `/dev/shm`, say `/dev/shm/afile', actually works the same as `shm_open` the file `/afile`. So I found a way to do IPC in LInux using shared memory, which is summarized in an answer below. – doraemon Nov 17 '17 at 01:48
10

Use multiprocessing to start with.

If you need multiple CPU's, look at celery.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • 4
    Is `multiprocessing` relevant for processes that were run interdependently? (not spawned by each other) – Jonathan Livni Oct 21 '11 at 07:09
  • @Jonathan: "interdependently"? The multi-processing package provides queues and pipes so that processes can synchronize with each other and pass objects around. Does that qualify as "interdependently"? – S.Lott Oct 21 '11 at 20:39
  • 5
    I meant independently of course... – Jonathan Livni Oct 21 '11 at 22:26
  • 2
    @Jonathan: Is this a requirement? If so, please **update** the question to include all the facts. The package provides numerous features for building distributed servers using internet protocols to communicate. http://docs.python.org/library/multiprocessing.html#module-multiprocessing.connection – S.Lott Oct 22 '11 at 00:39
6

After some test, I found that the following approach works for Linux using mmap.

Linux has /dev/shm. If you create a shared memory using POSIX shm_open, a new file is created in this folder.

Although python's mmap module does not provide the shm_open function. we can use a normal open to create a file in /dev/shm and it is actually similar and reside in memory. (Use os.unlink to remove it)

Then for IPC, we can use mmap to map that file to the different processes' virtual memory space. All the processes share that memory. Python can use the memory as buffer and create object such as bytes and numpy arrays on top of it. Or we can use it through the ctypes interface.

Of course, process sync primitives are still needed to avoid race conditions.

See mmap doc, ctypes doc and numpy.load which has an mmap_mode option.

doraemon
  • 2,296
  • 1
  • 17
  • 36
  • I know this answer is quite old.. but I'll give it a shot! Since it is possible to open a file in /dev/shm, what is the purpose of using mmap? Can't I just pass information back and forth between different applications by reading and writing to files in /dev/shm? From my understanding these do not get written to a hard drive? – RedSmolf Jul 18 '19 at 07:54
  • Although I didn't test on what you said, I feel it should be also fine. But it might be more convenient to map it in order for you to use the memory like a variable instead of a file. Happy to see your updates on the experiment. – doraemon Jul 24 '19 at 01:32
6

Both execnet and Pyro mention PyPy <-> CPython communication. Other packages from Python Wiki's Parallel Processing page are probably suitable too.

TryPyPy
  • 6,214
  • 5
  • 35
  • 63
5

Parallel Python might be worth a look, it works on Windows, OS X, and Linux (and I seem to recall I used it on a UltraSPARC Solaris 10 machine a while back). I don't know if it works with PyPy, but it does seem to work with Psyco.

ChrisC
  • 1,282
  • 9
  • 9