0

I have a dataframe (df) with multiple columns, and want to check if a group of potential variables may predict an outcome.

My dataframe would be something like this:

df <- data.frame(
dependent_variable  = c("A", "A","A","B","B"),
variable1 = c("3","5","2","1","6"),
variable2 = c("4","2","3","4","0"),
variable3 = c("b","c","b","a","c"),
variable4 = c("13","6","20","8","10"),
variable5 = c("5","5","2","3","1"),
variable6 = c("group1","group1","group2","group2","group2"),
variable7 = c("1","2","1","3","2"))

From previous analyses, I know that one of the variables is an actual predictor, so I am interested in a model with two covariates to start.

Then, I first create an empty dataframe to store all the information I may require to compare my models:

results <- data.frame(variable = character(), coefficient = numeric(), p_value = numeric(), AIC = numeric(), stringsAsFactors = FALSE)

Then I wrote the next loop to iterate the model across the potential predictors:

# dependent_variable: my outcome
predictor_variables <- c("variable1", "variable2", "variable3", "variable4", "variable5")

# Loop to iterate across my independent variables
for (variable in predictor_variables) {
  # formula
  formula <- paste("dependent_variable", paste("~", paste(variable, collapse = "+")))
  
  # logistic model
  model <- glm(formula, data = df, family = "binomial")
  
  # Coefficients and p-values
  coefficients <- coef(model)
  p_values <- summary(model)$coefficients[, "Pr(>|z|)"]
  
  # Akaike information criterion
  aic <- AIC(model)
  
  # Store the results in the dataframe
  results <- rbind(results, data.frame(variable = variable, coefficient = coefficients[2], p_value = p_values[2], AIC = aic))
}

There is a problem in the formula because I get this error message back:

Error in terms.formula(formula, data = data) : 
  invalid model formula in ExtractVars

Thanks if someone knows how to fix it!

  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. – MrFlick Jun 15 '23 at 21:16
  • Thanks! I guess now it is more clear! – Rafael Bravo Jun 15 '23 at 21:29
  • 1
    I think you want `formula <- paste("dependent_variable", paste("~", paste(variable, collapse = "+")))` ? Or better `formula <- reformulate(variable, response = "dependent_variable")` ? – Ben Bolker Jun 15 '23 at 21:41

0 Answers0