9

I have a successful randomforest model, and I want to integrate it in another software, I know that I can use some libraries (like fastRF in Java o ALGLIB's DecisionForest for other languages) but how I can use the "model" trained in R? I have to re-train it in the new language?

Another view is extract it somehow, but I dont't know how to do it...

Any help will be appreciated

Thanks in advance

nanounanue
  • 7,942
  • 7
  • 41
  • 73
  • Late reply, but [this email thread](https://stat.ethz.ch/pipermail/r-help/2003-August/037859.html) might be helpful to you. – nalzok Oct 12 '19 at 14:26

2 Answers2

13

Take a look at the pmml package which generate PMML for various models, RandomForest included. A basic example:

#?randomForest
library(randomForest)
library(pmml)
set.seed(131)
ozone.rf <- randomForest(Ozone ~ ., data=airquality, mtry=3,importance=TRUE, na.action=na.omit)
print(ozone.rf)
ozone.rf.pmml <- pmml(ozone.rf)
Paolo
  • 2,795
  • 1
  • 20
  • 23
  • Thanks for your answer @Paolo, but, after I have the PMML file, How I can execute it? – nanounanue Mar 20 '12 at 07:29
  • 1
    If the answer was useful for you, an upvote would be appreciated! ;-) Regarding your question you need to verify if you can import the pmml model in the language you'll use for deployment. – Paolo Mar 20 '12 at 08:14
  • You're right! the post was useful, let me vote. Could you recommend me some language or software that supports PMML? Thanks again for your help. – nanounanue Mar 20 '12 at 16:31
  • Hi again @Paolo, I tried to solution (PMML) and everything works fine using your example (Ozone dataset) or the iris dataset. But, I was reading the documentation of the pmml library in Cran (http://cran.r-project.org/web/packages/pmml/index.html), and it doesn't said anything about randomForest support, only about decision trees (which are part --a tree of the forest-- of the randomFoest, so, It is safe assume that the pmml library is doing its part? Thank you (again) for your time – nanounanue Mar 23 '12 at 05:12
  • My suggestion is to contact the authors (you can see the contact using packageDescription("pmml") ) and ask further details. An other possibility is to interrogate the folks at http://stats.stackexchange.com/ . Good luck! – Paolo Mar 23 '12 at 08:15
  • I just read the article "PMML: An Open Standard for Sharing Models" (The R Journal Vol. 1/1, May 2009) and they support randomForest, just as you said... Thanks for everything! – nanounanue Mar 23 '12 at 10:11
  • You are welcome! If nobody comes with an effective solution, you could select this answer as the selected one to your question ;-) – Paolo Mar 23 '12 at 10:15
  • I will mark this answer as the correct one, just let me try the [jpmml](http://code.google.com/p/jpmml/) library to execute the pmml, if that works this will be the answer :) – nanounanue Mar 24 '12 at 17:23
2

The randomForest object has all the information about each tree in the object. Each tree is not particularly complicated, though it can be confusing.

iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE,
                         proximity=TRUE)
> names(iris.rf$forest)
  [1] "ndbigtree"  "nodestatus" "bestvar"    "treemap"    "nodepred"  
  [6] "xbestsplit" "pid"        "cutoff"     "ncat"       "maxcat"    
  [11] "nrnodes"    "ntree"      "nclass"     "xlevels"   

To work out how to use the forest outside of R, you'll have to look at the source code. Download the source package of randomForest, extract the tar.gz and look in the src directory. In rf.c you will see the function classForest (and for regression look at regForest in regrf.c). Look at the R function predict.randomForest to see how its called. You might have to use getAnywhere("predict.randomForest") to see it within R.

It will require a fair bit of mucking around to extract the R information and predict in another package, so you'd have to think carefully before you actually did this. Refitting in the software you intend to use may be more straightforward.

rjad
  • 96
  • 4
  • Thank you for your answer @rjad, so, if I understand correctly, your recommendation is retrain the random forest in the new software, right? – nanounanue Mar 20 '12 at 07:30
  • I think that if it is straightforward to do so this will be the easiest approach. It might help if you gave some details on which language you want to work in. But see Paolo's suggestion below as well. – rjad Mar 20 '12 at 09:59
  • I try to implement it (preferably) in Java, but in C++ could work too. Thanks again :) – nanounanue Mar 20 '12 at 16:32
  • I had an awful orthographic mistake in the last post (I am sorry, english is not my main language), what I wanted to say was: I made it in ALGLIB, so if someone wants the code (a very simple one, really) I could share it... Thanks again – nanounanue Mar 24 '12 at 17:23