3

I've been trying to find a method to importing Java-ml into my python project. I have the jar file in the same path as my project.

I want to use it for kmeans clustering, since it allows me to change the distance metric. I am wondering though whether with the implementation that one of you suggest, whether I'll be able to pass a different java class as a parameter for the function?

I tried using:

import sys

sys.path.append(r"C:\Users\X\Desktop\X\javaml-0.1.7\javaml-0.1.7.jar")

import net.sf.javaml as jml

test = jml.clustering.Kmeans()

I considered using jython, however I am unsure of how it works, and it is unclear whether I could continue using idle and whether I would have to reprogram my project.

Lastly I considered using PyJNIus, however it is simply not working.

benj rei
  • 329
  • 2
  • 5
  • 12
  • Using PyJNIus is a fine way to do it. I suggest debugging your problem with that. There are also other libraries that let you call java code in a similar way. – inclement Feb 28 '16 at 23:21

2 Answers2

4

In short, you can't run Java code natively in a CPython interpreter.

Firstly, Python is just the name of the specification for the language. If you are using the Python supplied by your operating system (or downloaded from the official Python website), then you are using CPython. CPython does not have the ability to interpret Java code.

However, as you mentioned, there is an implementation of Python for the JVM called Jython. Jython is an implementation of Python that operates on the JVM and therefore can interact with Java modules. However, very few people work with Jython and therefore you will be a bit on your own about making everything work properly. You would not need to re-write your vanilla Python code (since Jython can interpret Python 2.x) but not all libraries (such as numpy) will be supported.

Finally, I think you need to better understand the K-Means algorithm, as the algorithm is implicitly defined in terms of the Euclidean distance. Using any other distance metric would no longer be considered K-Means and may affect the convergence of the algorithm. See here for more information.


Again, you can't run Java code natively in a CPython interpreter. Of course there are various third party libraries that will handle marshalling of data between Java and Python. However, I stand by my statement that for this particular use case you are likely better to use a native Python library (something like K-Medoid in Scikit-Learn). Attempting to call through to Java, with all the associated overhead, is overkill for this problem, in my opinion.

Community
  • 1
  • 1
BeRecursive
  • 6,286
  • 1
  • 24
  • 41
  • Thanks for the response, however my code is in Python 3 and Jython would require me to port it all over to Python 2, are you sure that I can't call the JVM to do this function. I know when I was using the Stanford POS tagger in python it always called the jvm from python 3 and it was able to run java classes. – benj rei Feb 28 '16 at 16:07
  • Depending on the content of your code, it may be that Python 2 and Python 3 would actually require no extra code. There is very significant overlap between the two versions. Secondly, when you say call the JVM do you mean like a subprocess call? Because that is obviously possibly, though marshalling your data between Python and Java may me difficult. – BeRecursive Feb 28 '16 at 16:20
  • My data, when inputted in kmeans, is all in list form, so marshalling the data theoretically shouldn't be too hard. It seems like they do use a subprocess call, as they have in their imports `from subprocess import PIPE` – benj rei Feb 28 '16 at 18:04
  • You actually can call java code from python, using e.g. PyJNIus as the OP suggests. You can't import it with a python import statement as in the example given, though (although with appropriate import hooks it would probably be possible). – inclement Feb 28 '16 at 23:22
  • 1
    @inclement no you can't run Java code in Python as shown by the OP. If you use third party libraries you may be able to interface with Java but again you are really just marshalling data between a Java/Python boundary. In the case of PyJNIus that data is being marshalled via a JNI interface but ultimately it is just a binary method of doing a subprocess call. They do not exist in the same process and you are not really running Java - you are just using wrapped classes that are handling the marshalling. – BeRecursive Feb 29 '16 at 09:50
2

To "answer" your question directly, Jython will be your best bet if you simply want to import Java classes. Jython strives very hard to be as compatible with Python 2.x as possible and does a good job. So you won't have to spend too much time rewriting code. Just simply run it with Jython and see what happens, then modify what breaks.

Now for the Python answer :D. You may want to use scikit for a native implementation. It will certainly be faster than running anything in Jython.

Update

I think the Py4J module is what you're looking. It works by running a server in your Java code and the Python code will communicate with the Java server. The only good thing about "Py4J" is that it provides the boiler plate code for you. You can very easily setup your own client/server with no extra modules. However I still don't think it's a superior option compared to Pythons native modules.

References

How to import Java class w/ Jython

Scikit - K-Means

notorious.no
  • 4,919
  • 3
  • 20
  • 34
  • Thanks for the response, however my code is in Python 3 and Jython would require me to port it all over to Python 2, are you sure that I can't call the JVM to do this function. I know when I was using the Stanford POS tagger in python it always called the jvm from python 3 and it was able to run java classes. – benj rei Feb 28 '16 at 16:07
  • Common problem with Jython, overloaded methods, and how to deal with them - http://stackoverflow.com/questions/21329491/calling-right-overload-of-the-java-method-from-jython – David Feb 28 '16 at 16:17
  • @benjrei It may have had some sort of socket/ip bridge involved, Stanford POS tagger's website mentioned nltk, perhaps you could use that instead? – David Feb 28 '16 at 16:35
  • @benjrei What do you mean "``call the JVM``"? – notorious.no Feb 28 '16 at 16:40
  • @notorious They use the subprocess module with pipe. In the source file it has the import `from subprocess import PIPE`. – benj rei Feb 28 '16 at 18:08
  • @David nltk is an nlp library and has nothing to do with connecting python to java. – benj rei Feb 28 '16 at 18:09
  • Did a search for ``using java python`` and found this post http://stackoverflow.com/questions/476968/using-a-java-library-from-python it might help you out. Apparently there is something call Py4J that loads Java into Python – notorious.no Feb 29 '16 at 01:12