0

I am trying to use randomForest to fit a regression model for my data set in R. My data set has 17 categorical independent variables and 8 numeric independent variables. The dependent variable is numeric. Here is my r script:

#view data structure
str(my.data2)


#Partition data to train and test
ind<-sample(2,nrow(my.data2),replace = TRUE, prob=c(0.8,0.2))
train <- my.data2[ind==1,]
test <- my.data2[ind==2,]

#Fitting the Random Forest Regression Model to the dataset
install.packages("randomForest")
library(randomForest)
set.seed(123)
regressor = randomForest(x = my.data2[1],
                     y = my.data2$`GPS Utilization Rate`,
                     ntree = 100)

Here is the results I get, could anyone help me understand why I get a negative % var explained? Should I change all categorical variables to factor? Thanks if anyone can provide suggestions how can I improve the model!

regressor

Call:
 randomForest(x = my.data2[1], y = my.data2$`GPS Utilization Rate`,         ntree = 100) 
           Type of random forest: regression
                 Number of trees: 100
No. of variables tried at each split: 1

      Mean of squared residuals: 0.03384334
                % Var explained: -0.09
Lucy
  • 1
  • 1
  • You are regressing only one variable against one other variable. You should include all predictors in the model – dww Jul 16 '18 at 17:25
  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Provide data for `my.data2` so we can run the code to see what's going on. – MrFlick Jul 16 '18 at 18:03

0 Answers0