pass column name to be used in a model formula

Question

I want to write a function that fits a linear model. I want to use it to build different models and check for normality of the residuals. The predictors are the same but the response column varies from model to model. I want to use a function that I can specify what the response variable is. How can I do it? The below code does not work. I want to use the same function and check for normality of the residuals using response1 and response2.

check_normality =  function(df, response){
                   lm1 = lm(response ~ var1 + var2 + var3 + var4, data = df)
                   normality_test = shapiro.test(lm1$residuals)
                   p_value = normality_test$p.value
                   p_value >= 0.05
                  }
 df  = data.frame(response1 = rnorm(100), response2 = rnorm(100), var1 = runif(100), var2 = runif(100)+2, var3 = rnorm(100), var4 = rnorm(100)+runif(100))

check_normality(df, 'response1')

score 2 · Accepted Answer · answered Jul 20 '19 at 05:36

We could use as.formula to convert string to formula

check_normality =  function(df, response) {
   lm1 = lm(as.formula(paste0(response, "~ var1 + var2 + var3")), data = df)
   normality_test = shapiro.test(lm1$residuals)
   p_value = normality_test$p.value
   p_value >= 0.05
}

check_normality(df, "response1")
#[1] TRUE
check_normality(df, "response2")
#[1] TRUE

pass column name to be used in a model formula

1 Answers1