0

I have a ML model which is trained as saved as pickle file, Randomforestclassifier.pkl. I want to load this one time using java and then execute my "prediction" part code which is written python. So my workflow is like:

  1. Read Randomforestclassifier.pkl file (one time)
  2. Send this model as input to function defined in "python_file.py" which is executed from java for each request
  3. python_file.py has prediction code and predictions returned should be captured by java code

Please provide suggestions for this workflow requirement I have used processbuilder in java to execute python_file.py and everything works fine except for model loading as one time activity

James Z
  • 12,209
  • 10
  • 24
  • 44
My3
  • 140
  • 1
  • 10
  • Use a python service something like flask, which can load the model at a single time independently from java and accepts inputs as requests. Or you can also look at the sklearn-pmml to convert pickle file into pmml files and directly load them as java objects. – Vivek Kumar Oct 01 '18 at 13:29
  • Can't you do the model loading using python? I understand that you explicitly asked for solutions using this workflow but I think this is the best way to solve the problem. If Randomforestclassifier.pkl is a remote file or something, download it using java, save it locally and provide the path of the file as an argument for python_file.py. – Fabio Picchi Oct 01 '18 at 17:07
  • My python-file.py should run once for each request but model loading takes time and I don’t want that to happen for each request and so I want to load model.pkl from java and send loaded model as argument to python-file.py.... I have tried using flask but my requirement is based on using queues so restful api is not suitable for my existing architecture. Is there any way like python client server type of programs where models can be loaded only once and predictions can be done using those models for each request ? Thanks for your time... – My3 Oct 02 '18 at 14:41
  • I forgot to mention one more requirement to my problem. I need to send "test data" from java to python. But process builder doesn't accept anything but strings. Is there any way out sending "test data" is some format that can be read by python script and can be converted to data frame? Thanks in advance. – My3 Oct 08 '18 at 08:23
  • Can we use python client server type program for this requirement? Load model one time in server program and predict with client program? I don't want to use any APIs as we want to use some existing architecture and use java to call some python script to get this done. – My3 Oct 12 '18 at 04:27

2 Answers2

0

You could use Jep.

I actually never tested the pickle module in Jep, but for your case it would be something like this:

try(Jep jep = new Jep()) {
    // Load model
    jep.eval("import pickle");
    jep.eval("with open('Randomforestclassifier.pkl', 'rb'): as f: clf = pickle.load(f)");
    Object randomForest = jep.getValue("clf");

    ...

    // Then in another context you can pass your model to your function
    jep.eval("import predictionModule");
    jep.set("arg", randomForest);
    jep.eval("result = predictionModule.use(arg)");
    Object result = jep.getValue("result");
}

Assuming you have a module named predictionModule.py which should be something like this:

import pickle

def use(model_as_bytes):
    model = pickle.loads(model_as_bytes)
    print(model)
    # do other stuff
    ...
    return prediction

Hope this helps.

btt
  • 66
  • 1
  • 3
  • Thanks @btt, I installed jep with great difficulty and am able to load pickle file ... first 3 lines of code working fine... can you please let me know how exactly to execute the prediction.py by sending randomForest as argument... Do we have to give any path for predictionModule.py for "import predictionModule" to work... – My3 Oct 11 '18 at 05:13
  • jep.eval("import predictionModule");------ This line is throwing error as predictionModule not found. How to resolve this error. Kindly help. Thanks. – My3 Oct 11 '18 at 06:48
  • This doesn't work ....randomForest is returned as string to python but not as ML model... randomForest.predict() is not working.... – My3 Oct 11 '18 at 06:57
  • This is normal. Actually randomForest should be passed as `bytes`, but string should be fine too. What you should do, is once you have `arg` in the python side, you do this: `arg = pickle.loads(arg)` ref: https://docs.python.org/3/library/pickle.html#pickle.loads The reason for that is that `jep.set` supports only standard types: http://ninia.github.io/jep/javadoc/3.7/ So everything else is passed as bytes, or as a string like in your case. – btt Oct 12 '18 at 09:08
  • I added an example of what `predictionModule.py` should look like. Note that all this is pseudocode that I didn't test. So please don't hesitate to edit my answer. – btt Oct 12 '18 at 09:17
  • This works fine but doesn’t solve my requirement of one time loading of models. ‘Jep’ doesn’t support multi threading so even if I could return arg , I am not able to use it for parallel requests. I searched online and it says that jep doesn’t support multithreaded programming. – My3 Oct 13 '18 at 12:37
  • Your current architecture is: one Jep instance linked to one python execution context. Of course if you multithread inside Jep then it won't work because of python's GIL. What you can do instead is have One Jep that loads the model and multiples Jeps that run in parallel but are single threaded in the inside. This way each single threaded Jep has its own python execution context that works independently from the other pythons. In another word you should keep your multithreading in java while making sure that for each thread you have a unique python context (aka Jep). – btt Oct 15 '18 at 15:44
  • I did not get what you said... how can I achieve what you said... Also "model = pickle.loads(model_as_bytes)" is not working "model_as_bytes" is returned as string only when I tried executing your code... – My3 Oct 16 '18 at 04:54
  • I tried out using multiple Jeps but this throws some memory error, I tried assigning some 10GB but still fails to work with even 2 Jeps. My pickled model is nearly 1gb file. Is there any way to resolve this memory issue? – My3 Oct 22 '18 at 08:54
  • thanks for introducing me to Jep. My issue is resolved. Thanks again..https://stackoverflow.com/questions/52830122/java-and-python-integration-using-jep Hope this helps some one with similar requirement – My3 Oct 30 '18 at 06:03
0

Simple Solution without Addition Libraries##

I implemented a solution described in another posting. I was successful at implementing the solution described by Chandan. It is basically just calling the python files via a command line from your java application and getting the results back as a buffered reader.

https://stackoverflow.com/a/65211138/12576070

Adjustment I made for my ML application

My application involves passing a large amount of data into a trained machine learning model in Python. The features data was too big to send over the command line as an argument (like a csv formatted string) so I instead saved the data as a csv file and sent the file path as the argument into my python prediction function.

I have not compared the speed to the jep solution (or the jpy solution). Potentially they would be faster, but this solution does not need any additional libraries to be installed and it is fairly simple and straight forward. I have recently wrestled enough with trying, unsuccessfully, to get java ML libraries to work with my existing application that I like this simple approach. You will, of course, need to do your own trade between simplicity and performance. If my application grows enough with its ML implementation to justify looking at a more complex solution I may also revisit my simplicity/performance trade.

MustardMan
  • 43
  • 1
  • 7