I am running this code and getting the following error after the for loop:
Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) :
undefined columns selected
The subsequent ggplots are giving straight lines on the fit indices because the train function is not working within the for loop.
library(ISLR)
attach(Wage)
library(caret)
#6
#code informed by https://ambarishg.wordpress.com/2015/09/08/caret-and-polynomial-linear-regression/
set.seed(1)
inTraining = createDataPartition(Wage$age, p = .75, list = FALSE)
training = Wage[ inTraining,]
testing = Wage[-inTraining,]
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv",
number = 10,
repeats = 10)
set.seed(2)
degree = 1:10
RSquared = rep(0,10)
RMSE = rep(0,10)
for ( d in degree)
{
LinearRegressor <- train(wage ~ poly(age,d),data=training, method = "lm", trControl = fitControl)
RSquared[d] <- LinearRegressor$results$Rsquared
RMSE[d]<- LinearRegressor$results$RMSE
}
library(ggplot2)
Degree.RegParams = data.frame(degree,RSquared,RMSE)
ggplot(aes(x = degree,y = RSquared),data = Degree.RegParams) +
geom_line()
ggplot(aes(x = degree,y = RMSE),data = Degree.RegParams) +
geom_line()
I think the problem is related to defining the variable d within the for loop. degree is successfully specified as a vector with length 10, but then when d is defined in degree, subsequent input of d into the console results in a vector of length 1.
Code from https://ambarishg.wordpress.com/2015/09/08/caret-and-polynomial-linear-regression/