1

Any idea on how to implement "Random Subspace Method" (an ensemble method) as described by (Ho,1998) in R? Can't find a package

Ho, Tin Kam (1998). "The Random Subspace Method for Constructing Decision Forests". IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844.

desertnaut
  • 57,590
  • 26
  • 140
  • 166

2 Answers2

1

Practically speaking, this has been "integrated" (kind of) into the Random Forest (RF) algorithm - it is in fact the random selection of features controlled by the mtry argument in the standard R package randomForest; see the Wikipedia entry on RF, as well as the answer (disclaimer: mine) in the SO thread Why is Random Forest with a single tree much better than a Decision Tree classifier? for more details.

While replicating the exact behavior of the said algorithm in the scikit-learn implementation of RF is easy and straightforward (just set bootstrap=False - see linked thread above), I'll confess that I cannot think of a way to get the same behavior from the randomForest R package - i.e. "force" it to not use bootstrap sampling, which would make it equivalent to the Random Subspace method; I have tried the combination of replace=FALSE and sampsize=nrow(x) in the randomForest function, but it doesn't seem to work...

All in all, the message here (and arguably the reason why there is not a specific implementation of the method in R or other frameworks) is that, most probably, you will be better off sticking to Random Forests; if you definitely want to experiment with it, AFAIK the only option seems to be Python and scikit-learn.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • Original RSM only selects a subset at the beginning of the tree construction. In fact, the author says: _"More different trees can be constructed if the subspace changes within the trees, that is, if different feature dimensions are selected at each split"_ [algorithm section](https://en.wikipedia.org/wiki/Random_subspace_method) – NavyEurofighter Sep 28 '19 at 20:29
  • I suppose a way to implement RSM in R could be achieved with a generic _ensemble_ function i could just feed with some models. – NavyEurofighter Sep 28 '19 at 20:35
1

Found this function in caret package:

model<-bag(x=iris[,-5], y=iris[,5], vars = 2, 
bagControl = bagControl(fit = ctreeBag$fit,
                        predict = ctreeBag$pred,
                        aggregate = ctreeBag$aggregate), 
trControl=trainControl(method = 'none'))

It supports vars attribute so you can consider a random subset of variables for each learner; at the same time bootstrap sampling can be avoided by passing method = 'none' as a parameter.