-1

I am trying to use the aov() function inside a function but R keeps giving me the same error.

Code:

dat$X1 = rep(c("a", "b"), 2)
dat$X2 = c(1,2,3,4)

f = function (x){
  aov(x ~ X1 , data = dat)
}
f('X2')

This gives me the following error:

Error in model.frame.default(formula = x ~ X1, data = dat, drop.unused.levels = TRUE) : 
variable lengths differ (found for 'X1')

The aov() works when I try to replace 'x' with the actual name of the variable (X2) so it doesn't make sense that the variable lengths would differ.

I have looked for this error everywhere but so far I haven't had luck finding the same error anywhere else.

I'm pretty sure that I am overlooking something very obvious but I've been stuck with this for a while.

Looking forward to reading your advise. Thanks.

mvel
  • 1
  • 1
  • f('X2') refers to a character value of "X2" not the variable dat$X2. But your code gives more errors than that. You need to use `dat <- data.frame(X1=rep(c("a", "b"), 2), X2=1:4)`. You cannot assign columns to a data frame that does not exist. You will have to pass the variable `X2` with its full name, `f(dat$X2)` since the function `f()` does not know where it is otherwise in order to pass it to `aov()`. Or use the cumbersome `with(dat, f(X2))`. – dcarlson Jul 21 '22 at 03:17
  • Sorry, I didn't include the line where I initialized the data frame. Using `f(dat$X2)` does work though. Thanks! – mvel Jul 21 '22 at 18:38

3 Answers3

0

If you want to use function, and aov inside it, you may try

dat <- data.frame(X1 = rep(c("a", "b"), 2), X2 = c(1,2,3,4))
f = function (x){
  ff <-as.formula(paste0(x, "~ X1")) 
  aov(ff , data = dat)
}
f('X2')

Call:
   aov(formula = ff, data = dat)

Terms:
                X1 Residuals
Sum of Squares   1         4
Deg. of Freedom  1         2

Residual standard error: 1.414214
Estimated effects may be unbalanced
Park
  • 14,771
  • 6
  • 10
  • 29
0

I advice against defining a function the way you do. Your function has two key flaws: (1) It depends on a global variable (never good). (2) You don't check whether any of the variables (one being hard-coded, the other being a user input, which is awkward in itself) in your formula exist in your (global) data.frame.

Here is a better approach:

better_f <- function(data, dep_var, indep_var) {
    stopifnot(all(c(dep_var, indep_var) %in% names(data)))
    aov(reformulate(indep_var, response = dep_var), data = data)
}

# Sample data
dat <- data.frame(X1 = rep(c("a", "b"), 2), X2 = 1:4)

# Test function
better_f(dat, "X2", "X1")
#Call:
#    aov(formula = reformulate(indep_var, response = dep_var), data = data)
#
#Terms:
#    X1 Residuals
#Sum of Squares   1         4
#Deg. of Freedom  1         2
#
#Residual standard error: 1.414214
#Estimated effects may be unbalanced
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
-2

You write a function and there exists one parameter in this function. So when you want to deploy the function, all you need to do is give it a parameter. So there is no need to add a quotation marks. Drop it, and it will work.

'''
X1 = rep(c("a", "b"), 2)
X2 = c(1,2,3,4)
dat <- as.data.frame(cbind(X1, X2))

f = function (x){
    aov(x ~ X1 , data = dat)
}
f(X2)  
#f(x=X2) this works, too.
'''
XixiHaha
  • 138
  • 5
  • 1
    Note that this only works because you created X1 and X2 outside the data frame first. If you remove X2 outside the data frame with `rm(X2)`, your code fails, but `f(dat$X2)` works. See my comment above. – dcarlson Jul 21 '22 at 03:27
  • Yes, you are right. It is much stable to follow your suggestion. But still, I can use `attach(dat)` before the funcution. It will work, too. – XixiHaha Jul 21 '22 at 03:31
  • @XixiHaha [Why you should not use `attach` in the way you propose](https://stackoverflow.com/questions/10067680/why-is-it-not-advisable-to-use-attach-in-r-and-what-should-i-use-instead). – Maurits Evers Jul 21 '22 at 03:34
  • Safer is `with(dat, f(X2)` as per my original comment. – dcarlson Jul 21 '22 at 03:44