0

I created an xml file using pmml function from pmml library in R.

adamodel_iOS=ada(label~.,data=train_iOS, iter=ntrees, verbose=TRUE, loss="ada", bag.frac=0.7, nu=0.1, control=defctrl, type="real")
Ptrain_iOS = predict(adamodel_iOS,newdata=train_iOS, type="prob")

library(pmml)
adapmml_iOS=pmml(adamodel_iOS)
saveXML(adapmml_iOS,"model_iOS.xml")

save.image()

After, training model in the first line, I found the corresponding probabilities for the training data.

Now I want to use this xml file to generate predictions on a set of data(basically the training set again). How do I do that in R? I see that in java and spark, we can load xml file generated by pmml function and then there are functions which can make predictions.

Basically, I am looking for a function in R that can take this xml file as an input and then return an object which in turn takes some datapoints as input and return their probabilities of having label 0 and 1.

I found a link: Can PMML models be read in R?

but it does not help

Community
  • 1
  • 1
pasternak
  • 361
  • 4
  • 14
  • If you train models in R, and you consume models in R, then why do you need PMML at all? What's wrong with `predict(adamodel_iOS)`? – user1808924 Mar 19 '16 at 11:14
  • I was expecting this question. :) My bigger purpose is to transport this model to a spark cluster to make predictions on millions of data-points. That's why I am documenting the model in pmml format. But things are going wrong as the scores generated by the adamodel are not matching with those generated by the Spark-code. So, to debug I want to ensure that nothing is changing in the course of documenting the model into pmml format. That's why I want to load the pmml file into R and then see if it is predicting the same values for training data as those output by the adamodel. – pasternak Mar 19 '16 at 11:45
  • The `pmml.ada()` function of R's `pmml` package is exporting broken models. Don't waste your time debugging it. – user1808924 Mar 19 '16 at 12:28
  • Ohh...that is a bad news! Can you please direct me to some source that has more information about malfunctioning of pmml(adamodel_object). My adamodel predicts labels 0 or 1, but the xml file generated after pmml() call contains -1 and 1 as labels. – pasternak Mar 19 '16 at 13:12
  • These models are broken in a sense that some tree split conditions are encoded incorrectly. Try to develop a toy problem (eg. using the `iris` dataset, predict if an iris instance is a versicolor or not) and see it for yourself. – user1808924 Mar 19 '16 at 16:20
  • Now since the pmml option is gone...I shall do the encoding myself. Is there any way to get string representation of the object returned by ada()? If I can get this object's representation in a string, then I can code it in java and transport the logic on the cluster to make predictions on a large set. – pasternak Mar 21 '16 at 07:03
  • Hi user1808924, I see that you suggested "r2pmml" here(http://stackoverflow.com/questions/26310836/how-can-i-export-a-gbm-model-in-r) . Would that encode the adaboost model correctly? – pasternak Mar 21 '16 at 08:45
  • The `r2pmml` package doesn't support the `ada::ada` model type. But you could use the underlying JPMML-R library (https://github.com/jpmml/jpmml-r) to try to implement it yourself. – user1808924 Mar 21 '16 at 10:18

2 Answers2

0

Check this link for the list of PMML producers and consumers. As you can see R is listed as producer not consumer. Also, algorithms for which R can produce the corresponding PMML files are listed.

The most comprehensive tool for PMML validator, convertor, and also for scoring data using PMML models is ADAPA, which is not free.

KNIME is an open source drag & drop analytics tool which supports both import and export of PMML files (not for all models and the features are limited.) It supports R, Python, and Java too.

Eissa N.
  • 1,695
  • 11
  • 18
0

Although it's a long time ago, I still want to share that you can use the "reticulate" to call the python pypmml package to implement your ideas in R, and in order to be more friendly and make the prediction look more like the predict function in R, I will It is encapsulated, the address of the package is here enter link description here