1

I am well aware there are much better solutions for the particular problem described below (e.g., cor and rcorr in Hmisc, as discussed here). This is just an illustration for a more general R issue I just can't figure out: passing multiple variable names from a character vector to a formula statement within a function.

Assume there is a dataset consisting of numeric variables.

vect.a <- rnorm(n = 20, mean = 0, sd = 1)
vect.b <- rnorm(n = 20, mean = 0, sd = 1)
vect.c <- rnorm(n = 20, mean = 0, sd = 1)
vect.d <- rnorm(n = 20, mean = 0, sd = 1)
dataset <- data.frame(vect.a, vect.b, vect.c, vect.d)
names(dataset) <- c("var1", "var2", "var3", "var4")

A correlation test has to be performed for each possible pair of variables within this data set, using a formula statement of the type ~ VarA + VarB within the function cor.test:

for (i in 1:(length(names(dataset))-1)){
    for (j in (i+1):length(names(dataset))) {
        cor.test(~ names(dataset)[i] + names(dataset)[j], data = "dataset")
    }
}

which returns an error: invalid 'envir' argument of type 'character'

I assume a character string is incompatible with the formula statement but which class would be compatible with it? If the entire approach is wrong, please explain why and provide or point to an alternative solution. If the approach is somehow "ugly" or "non-R", please explain why.

Community
  • 1
  • 1
skip
  • 297
  • 1
  • 6
  • 16

1 Answers1

3

You get that formula by using as.formula with a string argument.

>> x <- c('x1','x2','x3')
>> f <- as.formula(paste('~ ', x[1], ' + ', x[2]))
>> f
~x1 + x2
>> class(f)
[1] "formula"

There is another issue here, data="dataset" should be data=dataset, since dataset is a name.

> dataset <- data.frame(a=1:5, b=sample(1:5))
> cor.test(~ a + b, data="dataset")
Error in eval(predvars, data, env) : 
  invalid 'envir' argument of type 'character'
> cor.test(~ a + b, data=dataset)

Pearson's product-moment correlation
...
Dthal
  • 3,216
  • 1
  • 16
  • 10
  • As shown, this creates an object f of the class "formula" but when I enter this object as argument in the cor.test call it returns the same error: invalid 'envir' argument of type 'character'. – skip Jan 17 '16 at 10:10
  • My bad, this solves the problem. Thank you. I was surprised to learn that R has a formula class... – skip Jan 17 '16 at 10:30