4

We have one program in Java and one in Python, and need to get them taking together in a ping-pong manner, each time exchanging an integer array of length 100,000, and taking ~ 0.1 - 1 second to do their work:

  1. Java does some work and fires an int array of length 100,000 over to ...
  2. Python, which does some work and fires a new array of length 100,000 back to ...
  3. Java, which does some work ... etc

Note that

  • Each program needs to wait for the other to do it's part.
  • They will run on the same Linux machine.
  • We will do Monte Carlo simulation, so speed is important.

I am more familiar with Java, and understand that a shared memory backed file approach is likely to be the fastest. This seems relevant for the Java side, but how would I get each program to wait/block for the other to complete its work and update the shared memory before the other starts reading? I've heard of something called 'semaphore', but can't figure it out.

These are my fallback ideas, but perhaps they are better?

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
Pengin
  • 4,692
  • 6
  • 36
  • 62
  • 4
    Maybe a stupid suggestion, but if they run on the same machine, couldn't you execute the Python program via Jython (which should simplify the communication)? – UnholySheep Apr 29 '17 at 16:02
  • You just need to use blocking I/O to assure that each program receives the correct amount of information before continuing – Jacob G. Apr 29 '17 at 16:07
  • @JacobG. How does blocking I/O relate to shared memory IPC? – Pengin Apr 29 '17 at 16:18
  • @UnholySheep I was under the impression that that would be significantly slower, but could be wrong. – Pengin Apr 29 '17 at 16:19
  • 1
    Well you can just iterate from 0 to 100,000 and read an integer from a socket – Jacob G. Apr 29 '17 at 16:21
  • I am no expert in this area, but look into this: Create a temp file, maybe on a RAM disk, and mmap it in both processes (Java: `FileChannel.map`, Python: `mmap.mmap`). Also check https://stackoverflow.com/questions/15414753/shared-memory-between-c-and-java-processes – Philipp Wendler Apr 29 '17 at 16:25
  • 3
    I hope you see how much complexity that switch between Java and python creates. I would vote to either use jython or to drop one side and do all computations either with java or python. – GhostCat Apr 29 '17 at 16:34
  • @JacobG. So I think you are saying you vote for the Network/UnixDomain socket approach instead. I can see that would make blocking much simpler. – Pengin Apr 29 '17 at 16:45
  • 1
    Yes, interprocess communication is relatively simple but gets more complex when you're required to use more than one language – Jacob G. Apr 29 '17 at 16:47
  • Ramdisk is definitely a good idea. On Windows, OSFMount can create ramdisks. On macOS/Linux, `tmpfs` filesystems are ramdisks – MultiplyByZer0 Apr 29 '17 at 16:48
  • @MultiplyByZer0 I like the simplicity, and presumably /dev/shm will suffice. But I'm not aware of a Java way to block on file lock without retries, which is what I'm trying to avoid in the shared memory case. – Pengin Apr 29 '17 at 16:52
  • 1
    If you are doing Monte Carlo, your bottleneck should be the amount of CPU you are using. Using loopback should be more than fast enough. You can pass 400KB over loopback in a few milli-seconds. – Peter Lawrey Apr 29 '17 at 18:10
  • 1
    On a windows laptop I can send data between two JVMs over loopback at a rate of 300 MB/s using plain sockets. (in blocks of 400 KB) This means each 400 KB took an average of 1.3 ms to round trip between processes. I would look at whether you have efficient buffering of the data. – Peter Lawrey Apr 29 '17 at 20:07
  • If they're both running on the same machine, is there a reason to *not* just translate from Python to Java (or vice-versa) and avoid IPC in the first place? – code_dredd May 07 '17 at 22:07

2 Answers2

1

Go for a fast intermediary data server to assist in communication between them. Redis would do the trick. You'll need two data structures there:

  1. a list (your list of 100,000 items). We'll call that my_project:list for reference.
  2. a lock. This can just be a Redis string set to "Python" or "Java."

Then have the following interaction:

  1. Both Python and Java poll the Redis lock. If it's equal to "Python", it's Python's turn. If "Java," it's Java's turn.
  2. Whichever program's turn it is goes into work mode and does whatever it needs to my_project:list, then it sets the lock to the other program's turn.
  3. Repeat indefinitely.
Eli
  • 36,793
  • 40
  • 144
  • 207
  • Makes sense, but worried that the polling strategy would make things much slower than blocking on sockets. – Pengin Apr 29 '17 at 17:04
  • 2
    You can run a timeit for this. Polling won't cause more than a couple milliseconds lag tops for you as long as everything is on the same box. If your time horizon is 100ms - 1s anyway, this won't affect your timings in a big way, but will massively simplify your development. You would not have a good time debugging blocking on sockets. You're trading a ~.1 - 5% slowdown for much greater simplicity and development speed. It's a good trade. Use the extra time to you'll save debugging to speed up your Java and Python programs, which will be taking up the vast majority of your processing time. – Eli Apr 29 '17 at 17:10
1

You could try combining java and python in the same process using jep. The latest release added support for sharing memory between python and java using numpy ndarrays and java direct buffers. This would let you share the data without any copying which should give the best performance possible.

bsteffen
  • 371
  • 2
  • 9