0

I want to write a function that fits a linear model. I want to use it to build different models and check for normality of the residuals. The predictors are the same but the response column varies from model to model. I want to use a function that I can specify what the response variable is. How can I do it? The below code does not work. I want to use the same function and check for normality of the residuals using response1 and response2.

check_normality =  function(df, response){
                   lm1 = lm(response ~ var1 + var2 + var3 + var4, data = df)
                   normality_test = shapiro.test(lm1$residuals)
                   p_value = normality_test$p.value
                   p_value >= 0.05
                  }
 df  = data.frame(response1 = rnorm(100), response2 = rnorm(100), var1 = runif(100), var2 = runif(100)+2, var3 = rnorm(100), var4 = rnorm(100)+runif(100))

check_normality(df, 'response1')
Fisseha Berhane
  • 2,533
  • 4
  • 30
  • 48

1 Answers1

2

We could use as.formula to convert string to formula

check_normality =  function(df, response) {
   lm1 = lm(as.formula(paste0(response, "~ var1 + var2 + var3")), data = df)
   normality_test = shapiro.test(lm1$residuals)
   p_value = normality_test$p.value
   p_value >= 0.05
}

check_normality(df, "response1")
#[1] TRUE
check_normality(df, "response2")
#[1] TRUE
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213