0

I am trying to package an SVM classifier model I have written in python as a PMML to use it in a Flink project.

Reference: https://github.com/aedenj/flink-machine-learning-fish-market-example/blob/main/model/model.ipynb

The model is working fine and returning expected results, as shown below (not sure about the repeated output, but that's not the issue here).

enter image description here

When I am trying to package it as a PMML file, I am getting 'Requested array size exceeds VM limit' error.

enter image description here

enter image description here

Can anyone tell me what is happening here?

P.S. I wonder if it has something to do with Active Fields not being set. The training data is a One-hot encoded vector representation.

Vishnu Prasad
  • 73
  • 1
  • 9

1 Answers1

0

The sklearn2pmml.sklearn2pmml utility function is invoking the Java executable via Python's subprocess.Popen. If the default Java startup configuration is memory-wise too "small", then you can increase its size by specifying -Xms and/or -Xmx Java executable options.

Two ways to do this:

  • Export the desired configuration the JAVA_OPTS environment variable.
  • Starting from SkLearn2PMML 0.86.2, the sklearn2pmml.sklearn2pmml utility function supports java_home and java_opts parameters.

Sample usage:

pipeline = PMMLPipeline(..)
sklearn2pmml(pipeline, "pipeline.pmml", java_opts = ["-Xms2G", "-Xmx8G"])

Anyway, in the current case, please take this OutOfMemoryError as a warning sign that there is something wrong with your fitted model object. It does not make sense to perform OHE outside of the model, and then try to feed 0.5M input feature values to a (PMML-) model.

user1808924
  • 4,563
  • 2
  • 17
  • 20
  • Thanks for the assist. However, using PMML for my task turned out to be very confusing and I ended up calling the model via a REST API. Worked like a charm. – Vishnu Prasad Jan 25 '23 at 16:41