I have a simple function where I want to do some statistics on a data frame using a response variable "y".
validation <- function(data,y){
library(ISLR)
library(leaps)
data <- na.omit(data)
coll <- ncol(data)-1
attach(data)
train <- sample(c(TRUE,FALSE),nrow(data),rep=TRUE)
test <- (!train)
regfit.best <- regsubsets(y ~.,data= data[train,],nvmax = coll)
test.mat <- model.matrix(y ~.,data=data[test,])
val.errors <- rep(NA,coll)
for(i in 1:coll){
coefi <- coef(regfit.best,id=i)
pred <- test.mat[,names(coefi)]%*%coefi
val.errors[i]= mean((data[[y]][test]-pred)^2)
}
return(val.errors)
}
How do I pass y correctly to the following parts?:
regfit.best <- regsubsets(y ~.,data= data[train,],nvmax = coll)
test.mat <- model.matrix(y ~.,data=data[test,])
val.errors[i]= mean((data[[y]][test]-pred)^2)
For example calling validation(Hitters,"Salary") should yield
regfit.best <- regsubsets(Salary ~.,data= data[train,],nvmax = coll)
test.mat <- model.matrix(Salary ~.,data=data[test,])
val.errors[i]= mean((Hitters$Salary-pred)^2)