What is the best way to run the same java function repeatedly in python?

Question

I am doing a project that requires me to repeatedly run a java function in python (it's like designing a learning algorithm in python but the value function was provided in java)

So what would be the practice for this scenario? Shall I use subprocess.run() to call the java function every time or shall I use the things like Py4J, Jython or JPype? What's the difference between using subprocess.run() and the others?

The efficiency should be the top concern since I need to run the same java function repeatedly.

Possible duplicate of [Python: How can I execute a jar file through a python script](https://stackoverflow.com/questions/7372592/python-how-can-i-execute-a-jar-file-through-a-python-script) — lakshman, Apr 07 '18 at 05:21
Jython is not a library. It is an alternative implementation of python interpreter (as opposed to CPython) running on JVM and having access to any Java code natively. There is no "best way" exist in this life. Choose what is more appropriate for your project. — Ivan Aksamentov - Drop, Apr 07 '18 at 05:24

abarnert · Accepted Answer · 2018-04-07T06:16:18.803

Using subprocess has two problems. If neither one is relevant, it'll work fine.
- If you're sending large amounts of data back and forth, you have to serialize it in some format to pass in via files and command-line arguments, or pipes or sockets, which can be slow.
- If you're calling a whole lot of short functions instead of one occasional huge one, you'll be spending more time setting up and tearing down the JVM (and warming up the JIT) than doing actual work.
Jython has two problems. Again, if neither one affects you, it'll work fine.
- It can't use many popular third-party libraries because they're built in C, for CPython.
- It's out of date. The latest version implements Python 2.7, which is less than 2 years away from going out of support.
JPype has one problem, but it's a doozy. If the current fork does what you need and has no bugs blocking you, maybe it's ok anyway.
- It's a vaporware project abandoned over a decade ago. It was picked up and knocked into shape by someone else a few years ago, and the current maintainer is keeping it running, and occasionally gets patches for things like working in 64-bit cygwin or updating to OS X 10.9, but it's not exactly a vibrant project with major support behind it.
Py4J has two problems.
- It's incomplete. Not unusuable, and not completely moribund, but there hasn't been any visible work on it in over a year, and nobody seems interested in anything but the minimal functionality needed for Apache Spark.
- It's doing the same kind of serialization you'd do with subprocess behind your back, and more beyond that for every call you make, and the FAQ justifies this by saying performance is not a priority. (Spark just ignores all of that and uses its own channels for everything.)
- For more minimal use—just starting up a JVM and setting up a socket to it—it may be better than subprocess because you don't have to keep starting and tearing down a JVM, but writing a socket protocol on both sides is a little bit more work than storing files and passing filenames on the command line. (Not a huge hurdle, but a problem if you've never done this kind of thing before.)
You may also want to look at transpilers. I don't know much about any of them, but I've talked to people who are using BeeWare to compile Python 3.4 code to Java source code that they then build together with their native Java code. I'm pretty sure this won't work if you're using any C extension, but if that's not a problem for you, it might be worth considering.

Considering Apache Spark uses Py4J for its purposes, I'd argue it's the the best option behind Jython — OneCricketeer, Apr 07 '18 at 05:40
@cricket_007 As I understand it, Spark only uses Py4J to launch a JVM and set up a socket that's used as a control channel for custom Spark messages, while the actual data is sent over the same kinds of pipes used for distributed Python-Python or Scala-Scala computing. — abarnert, Apr 07 '18 at 05:48
So does that mean all of jython, jpype, py4j have been deprecated? No one is still under active development? — hzh, Apr 07 '18 at 05:51
@huangzonghao Nobody with any official standing has officially deprecated any of them, and there are people working on them to some extent, but there's a good chance that any problems you run into will never be fixed. — abarnert, Apr 07 '18 at 05:59
@abarnert Actually I feel like the most efficient way of doing this is to having a JVM running as a standby server, with all the classes I need to use initialized, and when I need to call the function I just pass the parameters to JVM and wait for the response. Then in is sense is Jpype a better choice? Since I can control the starting and shutting down of JVM in python? — hzh, Apr 07 '18 at 06:28
@huangzonghao Well, how big are the parameters? Are you talking about passing a 200MB array, or a string and two floats? And are you calling 30-minute-long functions a few times, or quick functions 100 million times? — abarnert, Apr 07 '18 at 06:30
@abarnert the parameters to pass is like an integer array of less than 20 integers? And I should be calling a quick function (definitely return in a few seconds) a huge amount of times(calling the function is a part of an iteration for a learning process. So let's say 100million times?) Then what's the wise choice for this purpose? — hzh, Apr 07 '18 at 06:34
@huangzonghao In that case, you definitely don't want to `subprocess` up a new JVM each time. I'd go look at JPype and Py4J and see if they do what you want or not, and if they're stable or fast enough, rather than try to spend any more time guessing. — abarnert, Apr 07 '18 at 06:36

What is the best way to run the same java function repeatedly in python?

1 Answers1