0

I am trying to create a reusable function. In the function, I would like to self-define a formula and then test the formula with the lm in my function and output the summary of the regression results.

I have tried using as.formula function to create my own formula in my self-defined function, but I get error messages with the following codes, not sure why, could anyone help me?

# create the data
x <- c(1,2,3,5,6,7,8,1,1,2,1)
y <- c(2,3,4,5,1,3,4,5,6,7,2)
z <- c(2,3,4,1,2,3,33,5,2,4,5)
i <- c(2,4,4,5,1,3,2,5,6,7,2)
j <- c(2,9,4,1,2,3,4,5,2,4,5)
k <- c(2,12,4,5,1,3,4,5,6,7,2)
q <- c(2,55,4,1,2,5,4,5,2,4,5)
m <- data.frame(x,y,z)

# the function
polyRegress <- function(pre1, pre2, dv, df){

  # This is the formula I want to test:
  # model <- lm(z ~ x + y + I(x^2) + I(x*y) + I(y^2), data=m)

  f <- as.formula(paste0(dv, " ~ ", pre1, " + ", pre2, " + ", "I(", pre1, "^2)", " + ", "I(", pre1, "*", pre2, ")", " + ", "I(", pre2, "^2)")

  results <- lm(f, data=df)
  summary(results)
}

# main
polyRegress(x, y, z, m)
polyRegress(i, j, k, m)

Also, in the outputs from the two polyRegress functions above, I want the names of the coefficients being x, y, I(x^2), I(x * y), I(y^2) and i, j, I(i^2), I(i * j), I(j^2), rather than pre1, pre2, I(pre1^2), I(pre1 * pre2), I(pre2^2)

wh41e
  • 183
  • 1
  • 3
  • 10
  • 1
    As all the data is in the data frame `m`, you don't need to input the x, y, z as well. You can write a formula as a single string then eg `as.formula("z ~ x + y...")` – Jonny Phelps Oct 10 '19 at 06:06
  • I think I didn't show my question clearly. I have edited my question again. The original codes about are just an illustration of my question. The original dataset, `m`, I showed only contains 3 variables. But my actual dataset contains many. Let's say if I have a dataset which contains variables `x, y, z, i, k, j, d, q`...I want to write an function which can not only gives results from `polyRegress(x, y, z, m)`, but also from `polyRegress(i, k, j, m)` – wh41e Oct 10 '19 at 08:59
  • I have edited my question again. Hope this would be clearer. Thanks @JonnyPhelps – wh41e Oct 10 '19 at 09:06

1 Answers1

1


With your example i think you don't need to have df argument because x,y,z,i ... are vectors.
When you call polyRegress(x, y, z, m) you use x,y and z vectors not the colnames in m.
So, in the first case you can use solutions give by using substitute to get argument name with to change coefficient's names.

# create the data
x <- c(1,2,3,5,6,7,8,1,1,2,1)
y <- c(2,3,4,5,1,3,4,5,6,7,2)
z <- c(2,3,4,1,2,3,33,5,2,4,5)
i <- c(2,4,4,5,1,3,2,5,6,7,2)
j <- c(2,9,4,1,2,3,4,5,2,4,5)
k <- c(2,12,4,5,1,3,4,5,6,7,2)
q <- c(2,55,4,1,2,5,4,5,2,4,5)
m <- data.frame(x,y,z)

# the function
polyRegress <- function(pre1, pre2, dv){
  # change pre1 by "x" or "i" ...
  pre1 <- deparse(substitute(pre1))
  pre2 <- deparse(substitute(pre2))
  dv <- deparse(substitute(dv))

  f <- paste0(dv, " ~ ", pre1, " + ", pre2, " + ", "I(", pre1, "^2)", " + ", "I(", pre1, "*", pre2, ")", " + ", "I(", pre2, "^2)")

  results <- lm(f)
  # at this step results$call = lm(formula = f), let's change it !
  results$call <- call('lm', formula = formula(f))
  summary(results)
}

# main
polyRegress(x, y, z)
polyRegress(i, j, k)

But if you really want to call variable in your dataframe you have to change your arguments by character. Beaucause you want to use dataframe's colnames.

# create the data
m <- data.frame(x,y,z,i,j,k)
rm(x,y,z,i,j,k)

# the function
polyRegress <- function(pre1, pre2, dv, df){
  f <- paste0(dv, " ~ ", pre1, " + ", pre2, " + ", "I(", pre1, "^2)", " + ", "I(", pre1, "*", pre2, ")", " + ", "I(", pre2, "^2)")

  results <- lm(f, data = df)
  # at this step results$call = lm(formula = f, data = df), let's change it !
  results$call <- call('lm', formula = formula(f), data = substitute(df)) 
  summary(results)
}

# main
polyRegress("x", "y", "z", m)
polyRegress("i", "j", "k", m)

I hope i understand your demand.