-2

I want to build a random forest model using R. I have 4000+ variables. Is there a simple way to enter the variables without typing each one into the syntax? Or is there another way to reduce the number of candidate variables without typing in each one? I come from the SAS world where I could write a macro to hold the variables names.

Artem
  • 3,304
  • 3
  • 18
  • 41
Olivia
  • 13
  • 2
  • 1
    Please make this question *reproducible*. This includes sample code (including listing non-base R packages), sample data (e.g., `dput(head(x))`), and expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Sep 20 '18 at 23:32
  • 6
    The short answer is that you can use `.` in a formula to use all variables, e.g. `randomForest(outcome ~ ., data = my_data)`. Make the question reproducible if you need further help. – Marius Sep 20 '18 at 23:36
  • Thank you! I'm so new to R that I haven't written code yet. I'm considering using Python. If you can tell me, is it the same in Python? – Olivia Sep 21 '18 at 00:18
  • Python syntax is not the same, but there should be a way to use all variables straightforwardly. – Marius Sep 21 '18 at 00:41

1 Answers1

0

As Marius indicated you can use . in formula to include all the explanatory variables in the model. Please see the code below:

library(randomForest)
data(mtcars)
randomForest(mpg ~ ., mtcars, keep.forest = FALSE, ntree = 100)

Output:

Call:
 randomForest(formula = mpg ~ ., data = mtcars, keep.forest = FALSE,      ntree = 100) 
               Type of random forest: regression
                     Number of trees: 100
No. of variables tried at each split: 3

          Mean of squared residuals: 6.39198
                    % Var explained: 81.84 
Artem
  • 3,304
  • 3
  • 18
  • 41