I am trying to optimize the performance of some natural language processing in a python project I am currently working on. Basically I would like to outsource the computationally intensive parts to use apache OpenNLP, which is written in Java.
My question is what would be the recommended way to link Java functions/classes back to my python code? The three main ways I have thought about are
using C/C++ bindings in python and then embedding a JVM in my C program. This is what I am leaning towards because I am somewhat familiar writing C extensions to python, but using a triangle of languages where C only functions as an intermediary doesn't seem right somehow.
using Jython. My main concern with this is that CPython is the overwhelmingly popular python implementation as far as I know and I don't want to break compatibility with other collaborators or packages.
streaming input and output to the binaries that come with OpenNLP. Apache provides tokenizers and such as stand-alone binaries that you can pipe data to and from. This would probably be the easiest option to implement, but it also seems like the most crude.
I'm wondering if anyone who has experience interfacing python and java knows how much the performance is likely to differ between these options, and which one is "recommended" or considered best practice in such a situation - or of course if there is an entirely different way to do it that I haven't thought of.
I did search SO for existing answers and found this, but it's an answer from 3.5 years ago and mentions some projects that are either dead, hard to integrate/configure/install or still under development.
Some comments mentioned that the overhead for all three methods is likely to be insignificant compared to the time required to run the actual NLP code. This is probably true, but I'm still interested in what the answer is from a more general perspective.
Thanks!