0

I have a simple function where I want to do some statistics on a data frame using a response variable "y".

validation <- function(data,y){

 library(ISLR)
 library(leaps)

 data <- na.omit(data)
 coll <- ncol(data)-1

  attach(data)

 train <- sample(c(TRUE,FALSE),nrow(data),rep=TRUE)
 test <- (!train)

 regfit.best <- regsubsets(y ~.,data= data[train,],nvmax = coll)
 test.mat <- model.matrix(y ~.,data=data[test,])

 val.errors <- rep(NA,coll)

 for(i in 1:coll){

  coefi <- coef(regfit.best,id=i)
  pred <- test.mat[,names(coefi)]%*%coefi
  val.errors[i]= mean((data[[y]][test]-pred)^2)
}

return(val.errors)

}

How do I pass y correctly to the following parts?:

 regfit.best <- regsubsets(y ~.,data= data[train,],nvmax = coll)
 test.mat <- model.matrix(y ~.,data=data[test,])

 val.errors[i]= mean((data[[y]][test]-pred)^2)

For example calling validation(Hitters,"Salary") should yield

 regfit.best <- regsubsets(Salary ~.,data= data[train,],nvmax = coll)
 test.mat <- model.matrix(Salary ~.,data=data[test,])

 val.errors[i]= mean((Hitters$Salary-pred)^2)
  • Use `paste` to create a character representation of the formula. I'm sure we can find a good duplicate question here among the many available...hold on. – joran Jul 08 '16 at 14:44
  • Here's [one](http://stackoverflow.com/q/17024685/324364), and [another](http://stackoverflow.com/q/9238038/324364), and [another](http://stackoverflow.com/q/4951442/324364). – joran Jul 08 '16 at 14:47
  • Thanks for the help, got it ! – hansensan Jul 10 '16 at 15:26

0 Answers0