1

I am conducting a methodcomparison study, comparing measurements from two different systems. My dataset has a large number of columns with variabels containing measurements from one of the two systems.

aX and bX are both measures of X, but from system a and b. I have about 80 pairs of variabels like this.

A simplified version of my data looks like this:

set.seed(1)
df <- data.frame(
  ID = as.factor(rep(1:2, each=10)),
  aX = rep(1:10+rnorm(10,mean=1,sd=0.5),2),
  bX = rep(1:10+rnorm(10,mean=1,sd=0.5),2),
  aY = rep(1:10+rnorm(10,mean=1,sd=0.5), 2),
  bY = rep(1:10-rnorm(10,mean=1,sd=0.5),2))

head(df)

  ID       aX       bX       aY         bY
1  1 1.686773 2.755891 2.459489 -0.6793398
2  1 3.091822 3.194922 3.391068  1.0513939
3  1 3.582186 3.689380 4.037282  1.8061642
4  1 5.797640 3.892650 4.005324  3.0269025
5  1 6.164754 6.562465 6.309913  4.6885298
6  1 6.589766 6.977533 6.971936  5.2074973

I am trying to loop through the elements of a character vector, and use the elements to point to columns in the dataframe. But I keep getting error messages when I try to call functions with variable names generated in the loop.

For simplicity, I have changed the loop to include a linear model as this produces the same type of error as I have in my original script.

#This line is only included to show that
#the formula used in the loop works when
#called with directly with the "real" column names

(broom::glance(lm(aX~bX, data = df)))$r.squared

[1] 0.9405218

#Now I try the loop

varlist <- c("X", "Y")

for(i in 1:length(varlist)){
  aVAR <- paste0("a", varlist[i])
  bVAR <- paste0("b", varlist[i]) 

  #VAR and cVAR appear to have names identical column names in the df dataframe
  print(c(aVAR, bVAR))

  #Try the formula with the loop variable names
  print((broom::glance(lm(aVAR~bVAR, data = df)))$r.squared)
  }

The error messages I get when calling the functions from inside the loop vary according to the function I am calling, the common denominator for all the errors is that the occur when I try to use the character vector (varlist) to pick out specific columns.

Example of error messages:

rmcorr(ID, aVAR, bVAR, df)

Error in rmcorr(ID, aVAR, bVAR, df) : 
  'Measure 1' and 'Measure 2' must be numeric

or

broom::glance(lm(aVAR~bVAR, data = df))

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion

Can you help me understand what goes wrong in the loop? Or suggest and show another way to acomplish what I am trying to do.

Steen Harsted
  • 1,802
  • 2
  • 21
  • 34

2 Answers2

2

Variables aren't evaluated in formulas (the things with ~).

You can type

bert ~ ernie

and not get an error even if variables named bert and ernie do not exist. Formula store relationships between symbols/names and does not attempt to evaulate them. Also note we are not using quotes here. Variable names (or symbols) are not interchangeable with character values (ie aX is very different from "aX").

So when putting together a formula from string values, I suggest you use the reformualte() function. It takes a vector of names for the right-hand side and an optional value for the left hand side. So you would create the same formula with

reformulate("ernie", "bert")
# bert ~ ernie

And you can use the with your lm

lm(reformulate(bVAR, aVAR), data = df)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
1

I'm too lazy to search for a duplicate on how to construct formulas programmatically, so here is a solution:

varlist <- c("X", "Y")

for(i in 1:length(varlist)){
  #make these symbols:
  aVAR <- as.symbol(paste0("a", varlist[i]))
  bVAR <- as.symbol(paste0("b", varlist[i])) 

  #VAR and cVAR appear to have names identical column names in the df dataframe
  print(c(aVAR, bVAR))

  #Try the formula with the loop variable names
  #construct the call to `lm` with `bquote` and `eval` the expression
  print((broom::glance(eval(bquote(lm(.(aVAR) ~ .(bVAR), data = df)))))$r.squared)
}
Roland
  • 127,288
  • 10
  • 191
  • 288
  • I prefer `print((broom::glance(lm(reformulate(bVAR, aVAR), data = df)))$r.squared)` myself to avoid `eval()` – MrFlick Jan 16 '18 at 15:55
  • @MrFlick I see no reason to avoid `eval`. My solution using `eval` is superior to all other solutions I have tried. E.g., it prints a nice formula in `summary` output, in contrast to your suggestion. – Roland Jan 16 '18 at 15:59
  • Fair enough. I guess my bigger concern was more readability (which is subjective). But obviously both work. – MrFlick Jan 16 '18 at 16:01