I have a dataframe (df) with multiple columns, and want to check if a group of potential variables may predict an outcome.
My dataframe would be something like this:
df <- data.frame(
dependent_variable = c("A", "A","A","B","B"),
variable1 = c("3","5","2","1","6"),
variable2 = c("4","2","3","4","0"),
variable3 = c("b","c","b","a","c"),
variable4 = c("13","6","20","8","10"),
variable5 = c("5","5","2","3","1"),
variable6 = c("group1","group1","group2","group2","group2"),
variable7 = c("1","2","1","3","2"))
From previous analyses, I know that one of the variables is an actual predictor, so I am interested in a model with two covariates to start.
Then, I first create an empty dataframe to store all the information I may require to compare my models:
results <- data.frame(variable = character(), coefficient = numeric(), p_value = numeric(), AIC = numeric(), stringsAsFactors = FALSE)
Then I wrote the next loop to iterate the model across the potential predictors:
# dependent_variable: my outcome
predictor_variables <- c("variable1", "variable2", "variable3", "variable4", "variable5")
# Loop to iterate across my independent variables
for (variable in predictor_variables) {
# formula
formula <- paste("dependent_variable", paste("~", paste(variable, collapse = "+")))
# logistic model
model <- glm(formula, data = df, family = "binomial")
# Coefficients and p-values
coefficients <- coef(model)
p_values <- summary(model)$coefficients[, "Pr(>|z|)"]
# Akaike information criterion
aic <- AIC(model)
# Store the results in the dataframe
results <- rbind(results, data.frame(variable = variable, coefficient = coefficients[2], p_value = p_values[2], AIC = aic))
}
There is a problem in the formula because I get this error message back:
Error in terms.formula(formula, data = data) :
invalid model formula in ExtractVars
Thanks if someone knows how to fix it!