3

I would like to export a Caret random forest model using the pmml library so I can use it for predictions in Java. Here is a reproduction of the error I am getting.

data(iris)
require(caret)
require(pmml)
rfGrid2 <- expand.grid(.mtry = c(1,2))
fitControl2 <- trainControl(
  method = "repeatedcv",
  number = NUMBER_OF_CV, 
  repeats = REPEATES)

model.Test <- train(Species ~ .,
  data = iris,
  method ="rf",
  trControl = fitControl2,
  ntree = NUMBER_OF_TREES,
  importance = TRUE,  
  tuneGrid = rfGrid2)

print(model.Test)
pmml(model.Test)

Error in UseMethod("pmml") : 
  no applicable method for 'pmml' applied to an object of class "c('train', 'train.formula')"

I was googling for a while, and found actually little info about exporting to PMML in general the pmml library has the randomforest in:

methods(pmml)
 [1] pmml.ada          pmml.coxph        pmml.cv.glmnet    pmml.glm          pmml.hclust       pmml.itemsets     pmml.kmeans      
 [8] pmml.ksvm         pmml.lm           pmml.multinom     pmml.naiveBayes   pmml.nnet         pmml.randomForest pmml.rfsrc       
[15] pmml.rpart        pmml.rules        pmml.svm 

It works using a direct randomforest model, but not the caret trained one.

library(randomForest)
iris.rf <- randomForest(Species ~ ., data=iris, ntree=20)
# Convert to pmml
pmml(iris.rf)
# this works!!!
str(iris.rf)

List of 19
 $ call           : language randomForest(formula = Species ~ ., data = iris, ntree = 20)
 $ type           : chr "classification"
 $ predicted      : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
...

str(model.Test)
List of 22
 $ method      : chr "rf"
 $ modelInfo   :List of 14
  ..$ label     : chr "Random Forest"
  ..$ library   : chr "randomForest"
  ..$ loop      : NULL
  ..$ type      : chr [1:2] "Classification" "Regression"
...
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
Dr VComas
  • 735
  • 7
  • 22

2 Answers2

5

You cannot invoke the pmml method with train or train.formula types (ie. this is the type of your model.Test object).

Caret documentation for the train method says that you can access the best model as the finalModel field. You can invoke the pmml method on that object then.

rf = model.Test$finalModel
pmml(rf)

Unfortunately, it turns out that Caret specifies the RF model using the "matrix interface" (ie. by setting the x and y fields), not using the more common "formula interface" (ie. by setting the formula field). AFAIK, the "pmml" package does not support the export of such RF models.

So, looks like your best option is to use a two-level approach. First, use the Caret package to find the most appropriate RF parametrization for your dataset. Second, train the final RF model manually using the "formula interface" with this parametrization.

user1808924
  • 4,563
  • 2
  • 17
  • 20
  • Thanks for the response, that line also returns an error, rf = model.Test$finalModel pmml(rf) what do you mean with the formula interface? just a randomforest from the randomforest package? – Dr VComas Dec 11 '14 at 21:19
  • 1
    Formula interface: `rf = randomForest(Species ~ ., data = iris)`. Matrix interface: `rf = randomForest(y = iris[, c("Species")], x = iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")], data = iris)` – user1808924 Dec 11 '14 at 21:48
  • That is, the `pmml` method accepts only RF models that have been trained using the Formula interface. It raises an error for RF models that have been trained using the Matrix interface. Unfortunately, the Caret package uses the Matrix interface. – user1808924 Dec 11 '14 at 21:52
2

You can use the r2pmml package to do the job:

library("caret")
library("r2pmml")

data(iris)

train.rf = train(Species ~ ., data = iris, method = "rf")
print(train.rf)
r2pmml(train.rf, "/tmp/train-rf.pmml")
user1808924
  • 4,563
  • 2
  • 17
  • 20
  • The help page at GitHub says you need this before loading r2pmml: `options("java.parameters" = c("-Xms4G", "-Xmx8G"))`. That did not cure the java error I got on a Mac, however: `java.lang.UnsupportedClassVersionError: org/jpmml/rexp/Main : Unsupported major.minor version 51.0` – IRTFM Dec 29 '15 at 16:53
  • These Java options allocate more memory to the JVM process, which will speed up the conversion of large RF models. However, your problem - `java.lang.UnsupportedClassVersionError` - indicates that you are using an outdated Java version. Please see related SO thread https://stackoverflow.com/questions/33882019/problems-with-r2pmml – user1808924 Dec 29 '15 at 18:09
  • Except a shell execution of `java -version` reports: "java version "1.7.0_79" Java(TM) SE Runtime Environment (build 1.7.0_79-b15)" – IRTFM Dec 29 '15 at 22:03
  • System and R's "rJava" package could be using different Java versions. I don't know about Mac, but on GNU/Linux it is possible to install `R-core` and `R-java` packages separately. If the `R-java` package is installed, then it takes precedence over the System's Java. Did you run the R code snippet that is given in http://stackoverflow.com/a/33882664/1808924? I believe that its printout points to a different Java installation than System's Java (ie. "java version 1.7.0_79"). – user1808924 Dec 29 '15 at 23:18