0

I'm relatively new to R; this is my first post on Stack Overflow. There should be a simple solution to this, but I haven't been able to figure it out.

The R code in "A" below creates a vector of 9 models of interest that I'd like to run through 10-fold cross validation (if it helps, models are at bottom of this post). I'd like to run this code ...

modelList <- read.csv("/Users/XX/Desktop/Academic/XX/XX/R/Core_Files/models.csv",header=F)$V1

set.seed(17)
cv.error.9=rep(NA,9)

for(i in modelList){
  cv.error.9[i]=cv.glm(collapsed_Y,eval(parse(text=paste("Mod",i,sep=""))),K=10)$delta[1]
}
cv.error.9

... (or something like it) and have the 'NA' in row cv.error.9[i] be replaced with the CV errors corresponding to model 'i'. I'd like the following table as my output.

       x
1   2.734539
2   2.710424
3   2.760761
4   2.564147
5   2.583432
6   2.681044
7   2.583303
8   2.570110
9   2.635983

Unfortunately, the code in "A" is not replacing rows cv.error.9[i], but rather appending test error rates to the end of vector cv.error.9, as shown here:

       x
1   0.000000
2   0.000000
3   0.000000
4   0.000000
5   0.000000
6   0.000000
7   0.000000
8   0.000000
9   0.000000
10  2.734539
11  2.710424
12  2.760761
13  2.564147
14  2.583432
15  2.681044
16  2.583303
17  2.570110
18  2.635983
Showing 1 to 18 of 18 entries, 1 total columns

Any help with this would be appreciated. Many thanks in advance.

Models:

print(modelList)
[1] <- glm(y ~ X1 + X2 + X3 + X4 + X5 + X6, family=gaussian, data=collapsed_Y)                                     
[2] <- glm(y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 , family=gaussian, data=collapsed_Y)                           
[3] <- glm(y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X4*X5, family=gaussian, data= collapsed_Y)
[4] <- glm(y ~ X1 + X2 + X3 + X4 + X5 + X6, family=poisson, data= collapsed_Y)                                     
[5] <- glm(y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7, family=poisson, data= collapsed_Y)                            
[6] <- glm(y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X4*X5, family=poisson, data= collapsed_Y) 
[7] <- glm.nb(y ~  X1 + X2 + X3 + X4 + X5 + X6, data= collapsed_Y)                                                  
[8] <- glm.nb(y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7, data= collapsed_Y)                                         
[9] <- glm.nb(y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X4*X5, data= collapsed_Y) 
9 Levels: <- glm.nb(y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X4*X5, data= collapsed_Y) ...             

  • something like `for (i in 1:length(modelList)) {cv.error[[i]] <- your_model_code}` should work. – Chase Apr 24 '20 at 02:41
  • no luck -- applying those changes gave me this error message "Error in eval(parse(text = paste("Mod", i, sep = ""))) : object 'Mod1' not found" – sstem89 Apr 24 '20 at 02:57
  • That's more to do with this part of your code `eval(parse(text=paste("Mod",i,sep="")))`...what are you trying to do there? [Here](https://stackoverflow.com/questions/13649979/what-specifically-are-the-dangers-of-evalparse) is some good context on the potential pitfalls on `eval(parse())`. You're almost certainly making this harder than it needs to be. If you can make your question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), people will be better equipped to help you. – Chase Apr 24 '20 at 03:03
  • Thanks for refs. 1) To implement 10-fold CV across 9 models in modelList, code was written to do the following (modelList[1] as an example): 2) text=paste("Mod",i,sep=""): text=Mod1 <- glm(y ~ X1 + X2 + X3 + X4 + X5 + X6, family=gaussian, data=collapsed_Y) 3) parse(text=paste("Mod",i,sep="")): changes the above to an expression. 4) eval(parse(text=paste("Mod",i,sep=""))): used to evaluate this expression within the context of CV. 5) The code works, it just inconveniently appends results to vector cv.error.9 instead of replacing existing values. – sstem89 Apr 24 '20 at 03:32

1 Answers1

0

Here's a reproducible example that hopefully gives you enough to update your code.

#placeholder
rsq <- rep(NA,3)
#three separtae model specs for mtcars dataset
f <- list(formula(mpg ~ cyl), 
          formula(mpg ~ cyl + hp), 
          formula(mpg ~ cyl + hp + wt))

for (i in seq_along(f)){
  #fit the model
  m <- lm(f[[i]], data = mtcars)
  #extract value of interest
  rsq[[i]] <- summary(m)$r.squared
}
rsq
#> [1] 0.7261800 0.7407084 0.8431500

Created on 2020-04-23 by the reprex package (v0.3.0)

Chase
  • 67,710
  • 18
  • 144
  • 161
  • Thank you @Chase! Using the `list` function worked. Alternatively, I realized that merely changing `cv.error.9=rep(NA,9)` to `cv.error.9=rep(as.numeric(),9)` also works -- those more experienced with R might be better able to comment on this, but I think the lesson for me (& hopefully others like me) here is that the vector type needs to be identical to that of the data that is being replaced. In my case, I was trying to replace a `character` vector type with `numeric` data. – sstem89 Apr 24 '20 at 20:44
  • @sstem89 - I'm not sure that's the right takeaway....see this example for a counterpoint - `v <- vector("character", length = 9); for (j in 1:length(v)){ v[[j]] <- j}; length(v)`. The problem you encountered was indexing / referencing the right slot in the vector you were trying to assign to....in the example above - I create a character vector of length 9 and then simply assign the values 1...9 to each element of `v`. They all populate the right "spot", but still retain their character status. – Chase Apr 29 '20 at 18:56