1

I am working on the "randomForest" R package to change sampling method for feature subset selection at the nodes of trees in the forest. Currently random forest uses simple random sampling to do that. I tried to look at the R code by using the commands

library(randomForest)

getAnywhere(randomForest.default)

but could not find the relevant code chunk where "mtry" features are selected. How can I do this change in the source code?

Khan
  • 107
  • 1
  • 9

1 Answers1

1

I also tried using the S3 and S4 methods described in this SO question, but did not see all the functions in the randomForest package, and more important, did not see the randomForest() method listed.

However, if you navigate to the CRAN page for randomForest, you will see a link to the source code for the package:

https://cran.r-project.org/web/packages/randomForest/index.html

You can download a TAR file which has all the source code for the package from the above link. The actual source code seems to be in the code folder, e.g. rf.c which looks like it might be the file you want to refactor.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Did you even try using the link I gave above? I downloaded the entire source code for the `randomForest` package in less than 5 minutes. I actually think my answer is the way to go if you want to do a serious refactor of the code, because it allows you to get a TAR containing everything. – Tim Biegeleisen Jul 27 '17 at 13:45
  • Yes I have the source code including .c functions but could not locate it – Khan Jul 27 '17 at 14:19
  • Finding the file or files you need to change is really up to you, as we can't know exactly what you have in mind. – Tim Biegeleisen Jul 27 '17 at 14:19
  • I want to refactor the chunk where a sub sample of variables are selected at nodes – Khan Jul 27 '17 at 14:24
  • To be honest with you, for you to reliably change the algorithm, you're going to need to understand how it works, at least at a fairly deep level. So you might have to spend a little time reading through the source code. This is not a waste of your time, because you'll probably learn a lot while doing it. – Tim Biegeleisen Jul 27 '17 at 14:25