I want to build a random forest model using R. I have 4000+ variables. Is there a simple way to enter the variables without typing each one into the syntax? Or is there another way to reduce the number of candidate variables without typing in each one? I come from the SAS world where I could write a macro to hold the variables names.
Asked
Active
Viewed 160 times
-2
-
1Please make this question *reproducible*. This includes sample code (including listing non-base R packages), sample data (e.g., `dput(head(x))`), and expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Sep 20 '18 at 23:32
-
6The short answer is that you can use `.` in a formula to use all variables, e.g. `randomForest(outcome ~ ., data = my_data)`. Make the question reproducible if you need further help. – Marius Sep 20 '18 at 23:36
-
Thank you! I'm so new to R that I haven't written code yet. I'm considering using Python. If you can tell me, is it the same in Python? – Olivia Sep 21 '18 at 00:18
-
Python syntax is not the same, but there should be a way to use all variables straightforwardly. – Marius Sep 21 '18 at 00:41
1 Answers
0
As Marius indicated you can use .
in formula to include all the explanatory variables in the model. Please see the code below:
library(randomForest)
data(mtcars)
randomForest(mpg ~ ., mtcars, keep.forest = FALSE, ntree = 100)
Output:
Call:
randomForest(formula = mpg ~ ., data = mtcars, keep.forest = FALSE, ntree = 100)
Type of random forest: regression
Number of trees: 100
No. of variables tried at each split: 3
Mean of squared residuals: 6.39198
% Var explained: 81.84

Artem
- 3,304
- 3
- 18
- 41