0

I am running this code and getting the following error after the for loop:

Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) : 
  undefined columns selected

The subsequent ggplots are giving straight lines on the fit indices because the train function is not working within the for loop.

library(ISLR)
attach(Wage)
library(caret)

#6
#code informed by https://ambarishg.wordpress.com/2015/09/08/caret-and-polynomial-linear-regression/

set.seed(1)

inTraining = createDataPartition(Wage$age, p = .75, list = FALSE)
training = Wage[ inTraining,]
testing = Wage[-inTraining,]

fitControl <- trainControl(## 10-fold CV
  method = "repeatedcv",
  number = 10,
  repeats = 10)

set.seed(2)
degree = 1:10
RSquared = rep(0,10)
RMSE = rep(0,10)

for ( d in degree)
{
  LinearRegressor <- train(wage ~ poly(age,d),data=training, method = "lm", trControl = fitControl)
  
  RSquared[d] <- LinearRegressor$results$Rsquared
  
  RMSE[d]<- LinearRegressor$results$RMSE
  
}

library(ggplot2)
Degree.RegParams = data.frame(degree,RSquared,RMSE)
ggplot(aes(x = degree,y = RSquared),data = Degree.RegParams) +
  geom_line()

ggplot(aes(x = degree,y = RMSE),data = Degree.RegParams) +
  geom_line()

I think the problem is related to defining the variable d within the for loop. degree is successfully specified as a vector with length 10, but then when d is defined in degree, subsequent input of d into the console results in a vector of length 1.

Code from https://ambarishg.wordpress.com/2015/09/08/caret-and-polynomial-linear-regression/

phiver
  • 23,048
  • 14
  • 44
  • 56
  • I encounter your error also, I don't know the reason but would say that blog post is very old, likely out of date will `caret`. I would say that if you should probably use `tidymodels` instead of `caret`, `tidymodels` is the successor to `caret`. Homepage here: https://www.tidymodels.org/ – Jerry424 Oct 09 '20 at 13:23

1 Answers1

0

The problem is not really a problem. The problem is caused because you attached the dataset Wage. This interferes with the calling of the variables in the train statement. Read this SO post for more info on attach issues

Solution: start your code as follows and it will run fine.

library(ISLR)
library(caret)
data("Wage")

# rest of your code here
phiver
  • 23,048
  • 14
  • 44
  • 56
  • Thank you for the suggestion. I tried what you recommended but still got the error message. – nicholas heimpel Oct 09 '20 at 22:39
  • Try in a clean R session and make sure you do not use `attach(Wage)` anywhere. – phiver Oct 10 '20 at 08:58
  • I did so and R returned the same error message. I still think there is a problem with the for loop. When I remove the train function from the for loop and specify a polynomial degree, the function works. – nicholas heimpel Oct 13 '20 at 13:41