Random Subspace Method in R

Question

Any idea on how to implement "Random Subspace Method" (an ensemble method) as described by (Ho,1998) in R? Can't find a package

Ho, Tin Kam (1998). "The Random Subspace Method for Constructing Decision Forests". IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844.

desertnaut · Answer 1 · 2019-09-25T18:18:02.607

Practically speaking, this has been "integrated" (kind of) into the Random Forest (RF) algorithm - it is in fact the random selection of features controlled by the mtry argument in the standard R package randomForest; see the Wikipedia entry on RF, as well as the answer (disclaimer: mine) in the SO thread Why is Random Forest with a single tree much better than a Decision Tree classifier? for more details.

While replicating the exact behavior of the said algorithm in the scikit-learn implementation of RF is easy and straightforward (just set bootstrap=False - see linked thread above), I'll confess that I cannot think of a way to get the same behavior from the randomForest R package - i.e. "force" it to not use bootstrap sampling, which would make it equivalent to the Random Subspace method; I have tried the combination of replace=FALSE and sampsize=nrow(x) in the randomForest function, but it doesn't seem to work...

All in all, the message here (and arguably the reason why there is not a specific implementation of the method in R or other frameworks) is that, most probably, you will be better off sticking to Random Forests; if you definitely want to experiment with it, AFAIK the only option seems to be Python and scikit-learn.

Original RSM only selects a subset at the beginning of the tree construction. In fact, the author says: _"More different trees can be constructed if the subspace changes within the trees, that is, if different feature dimensions are selected at each split"_ [algorithm section](https://en.wikipedia.org/wiki/Random_subspace_method) — NavyEurofighter, Sep 28 '19 at 20:29
I suppose a way to implement RSM in R could be achieved with a generic _ensemble_ function i could just feed with some models. — NavyEurofighter, Sep 28 '19 at 20:35

NavyEurofighter · Answer 2 · 2019-10-06T23:53:43.507

Found this function in caret package:

model<-bag(x=iris[,-5], y=iris[,5], vars = 2, 
bagControl = bagControl(fit = ctreeBag$fit,
                        predict = ctreeBag$pred,
                        aggregate = ctreeBag$aggregate), 
trControl=trainControl(method = 'none'))

It supports vars attribute so you can consider a random subset of variables for each learner; at the same time bootstrap sampling can be avoided by passing method = 'none' as a parameter.

Random Subspace Method in R

2 Answers2