I am trying to use randomForest to fit a regression model for my data set in R. My data set has 17 categorical independent variables and 8 numeric independent variables. The dependent variable is numeric. Here is my r script:
#view data structure
str(my.data2)
#Partition data to train and test
ind<-sample(2,nrow(my.data2),replace = TRUE, prob=c(0.8,0.2))
train <- my.data2[ind==1,]
test <- my.data2[ind==2,]
#Fitting the Random Forest Regression Model to the dataset
install.packages("randomForest")
library(randomForest)
set.seed(123)
regressor = randomForest(x = my.data2[1],
y = my.data2$`GPS Utilization Rate`,
ntree = 100)
Here is the results I get, could anyone help me understand why I get a negative % var explained? Should I change all categorical variables to factor? Thanks if anyone can provide suggestions how can I improve the model!
regressor
Call:
randomForest(x = my.data2[1], y = my.data2$`GPS Utilization Rate`, ntree = 100)
Type of random forest: regression
Number of trees: 100
No. of variables tried at each split: 1
Mean of squared residuals: 0.03384334
% Var explained: -0.09