In my case, I'm hoping to compute different glm and lda models for a certain subset. Y variable or output is the same in each model, but a forward best subset selection model is carried out for the variables found most significant in a random forest analysis.
However, when trying to iterate I can't find anything that could work as follows
#Ordered data frame (ordered_df_train) is just the data frame ordered using the previously mentioned #method, considering the first variable to be crim (the output)
list_formula <- vector(mode = "list", length = 13)
list_formula[[1]] <- ordered_df_train$crim ~ ordered_df_train$age
for(j in 3:14){
list_formula[[j-1]] <- ordered_df_train$colnames(ordered_df_train)[j]
}
However,
ordered_df_train$colnames(ordered_df_train)[j]
execution reports NULL, therefore, not taking the variable expected.
Edit: As suggested, the previously used data for reproducibility is defined as:
library(MASS)
df_train <- Boston
ordered_df_train <- data.frame(
crim = df_train$crim,
age = df_train$age,
nox = df_train$nox,
tax = df_train$tax,
indus = df_train$indus,
dis = df_train$dis,
rad = df_train$rad,
black = df_train$black,
rm = df_train$rm,
lstat = df_train$lstat,
zn = df_train$zn,
ptratio = df_train$ptratio,
medv = df_train$medv,
chas = df_train$chas
)
Hope this allows a execution of my question. The objective is to have a list of formulas based on the forward method for best subsect selection by adding after each iteration the next most significative variable.