My problem is for the mtcars data set in R, I need to create all possible additive linear regression models where I'm regressing on the mpg variable. The null model is easy, as there's
10 choose 0 ways to get the null model, and 10 choose 1 ways to create a SLR on mpg; 10 choose 2 ways to create a two variable regression on mpg; 10 choose 3 ways to create a SLR on mpg; etc.,
So in total, as this is equivalent to summing across the 10th row in Pascal's Triangle, the total models I need to consider comes out to be 1,024.
Now, the other tricky part is I need to somehow store each model in some separate object so that all the 2 variable models are grouped together, all the three variable models are grouped together, etc, on top of also storing all them together (though perhaps there's a more efficient way to do this). The reason for this is my task is to look at all of these models, take their AIC scores and their Mallow's Cp scores, store those in a data frame, and then sort those scores from lowest to highest and keep the top 10. On top of this, I need to also be able to store, see, and have access to/use the two best 1-variable models through the two best 10-variable models because I need to provide the r-squared values and adjusted r-squared values for these various models along with the error mean square value. I'm still pretty/relatively new to R/coding in general, but I provide my attempt below:
library(rje) # provides the powerSet function
library(olsrr) # provides the ols_mallows_cp function to calculate the Mallow's Cp values
mtcars <- datasets::mtcars
x <- powerSet(colnames(mtcars[,-1]))
datalist <- list()
for(i in c(2:1024)){
datalist[[i]] <- mtcars[,colnames(mtcars) %in% c("mpg",x[[i]]) ]
}
full_model <- lm(mpg ~ ., data = mtcars)
Cp_vec <- c()
for (i in c(2:1024)){
model <- lm(mpg ~ ., data = datalist[[i]])
Cp_vec[i] <- ols_mallows_cp(model, full_model)
}
names(Cp_vec) <- as.character(c(1:1024))
TenSmallestCp <- Cp_vec[cpvec %in% head(sort(Cp_vec),10)]
Small_List <- list()
for (i in 1:10){
Small_List[[i]] <- x[[as.numeric(names(TenSmallestCp))[i]]]
}
Small_List[[1]]
Small_List[[2]]
Small_List[[3]]
Small_List[[4]]
Small_List[[5]]
Small_List[[6]]
Small_List[[7]]
Small_List[[8]]
Small_List[[9]]
Small_List[[10]]
The way I currently have it produces this as its output:
[1] "cyl" "wt"
[1] "hp" "wt"
[1] "cyl" "hp" "wt"
[1] "cyl" "wt" "qsec"
[1] "hp" "wt" "am"
[1] "wt" "qsec" "am"
[1] "disp" "wt" "qsec" "am"
[1] "hp" "wt" "qsec" "am"
[1] "cyl" "wt" "carb"
[1] "wt" "qsec" "am" "carb"
So this tells me what the 10 best models are with regards to the Mallow's Cp scores, but perhaps it's just because I've been staring at this problem for way too long, but I can't figure out how to actually save the linear model and have access to it, say, if I wanted to plot it or something. I know I could just easily recreate it with my output, but I'm also trying to become efficient with my coding and not always resort to hard coding things, you know? I also cannot figure out how to store the models based on the number of variables that are included in the model so I can access the top two models from each.
Before posting this, I checked out these links:
I admit that because I'm new, the answer to my problem(s) might fully exist in some linear combination of these three answers, and I'm just having trouble seeing it and putting it together, but while I think the first link I shared does have a lot that's relevant to my problem, and the last one also is pretty related, I'm not sure how the second one is much help. That's why I'm posting this as a new question.
Thanks for taking the time to read this lengthy post and consider helping me with my problem here!