1

I am relying on the compareGroups package to do some comparisons after a pipe-chain. When subsetting the final results, the call to [ triggers a call to update (both in their bespoke compareGroups-versions) which leads to a scoping problem.

Try this:

library(tidyverse)
# install.packages("compareGroups")
library(compareGroups)

get_data <- function() return(mtcars)

assign_group <- function(df) {
  n <- nrow(df)
  df$group <- rbinom(n, 1, 0.5)
  return(df)
}

get_results <- function(){
  get_data() %>% assign_group %>% compareGroups(group ~ ., data = .)
}

res <- get_results()
# all the above works, but the following triggers the error:
res["mpg"]

This leads to the following error:

Error in compareGroups(formula = group ~ mpg, data = .) : object '.' not found

The relevant (abbreviated) traceback is this:

compareGroups(formula = group ~ mpg, data = .) 
eval(call, parent.frame()) 
update.compareGroups(x, formula = group ~ mpg) 
update(x, formula = group ~ mpg) at <text>#1
eval(parse(text = cmd)) 
`[.compareGroups`(res, "mpg") 
res["mpg"] 

So, my understanding is that that the dot-notation in the dplyr pipe-chain prevents the update-call to find the dataframe, which is stored as . in the call. So, the error makes sense as neither . is not the name of the dataframe, nor available outside of the scope of the function get_results (though the main issue is the .). One obvious way of avoiding this error is by fixing the update.compareGroups function - I don't think we need another call to the package to redo all calculations when I simply want to retrieve individual results (which have already been calculated).

However, this is a more general issue with the . notation of dplyr and the fact it is stored in the call. This problem seems general enough so that I would imagine someone has encountered it before, and has found a more general solution?

coffeinjunky
  • 11,254
  • 39
  • 57
  • This would be a general solution to sloppy code :). Take a look at `compareGroups:::'[.compareGroups'`, it's full of `eval(parse(text=)`, formulas are built by pasting strings, there is a `i <- i` assignment, the code calls the data by name (the dot) while it looks like the data is stored in the object anyway. The github version might work though, the code is different : https://github.com/isubirana/compareGroups/blob/master/R/z.%5BcompareGroups.R . I can't install it for some reason. I think if you have issues you should file an issue there, there's not much we can do from the outside. – moodymudskipper May 17 '19 at 17:45
  • Has this issue been solved somehow? – Torakoro Jan 23 '21 at 09:52

1 Answers1

0

Firstly, I don't think piping your data into compareGroups makes sense - remember that piping means the first argument to compareGroups() is now the dataframe, even though the function specification is:

compareGroups(formula, data, ...)

Secondly, this dplyr vignette shows you can use .data instead of just . to access the piped data. However, in this case the following will cause a crash giving message data argument will be ignored since formula is already a data set (due to the data being piped into first argument).

get_results <- function(){
  get_data() %>% assign_group %>% compareGroups(group ~ ., data = .data)  # does NOT work
}

Making a separate call to compareGroups without piping then gets me into an unholy mess of environments whereby res does not have access to the data when requesting res['mpg'] outside the function get_results(), as you already alluded to with the scoping problem. I think this is a compareGroups problem, because if I use the same architecture with glm there's no such problem. So best I can do is to take the dataframe out of the function environment, which I think doesn't properly answer your question:

get_data <- function() return(mtcars)

assign_group <- function(df) {
    n <- nrow(df)
    df$group <- rbinom(n, 1, 0.5)
    return(df)
}
df = get_data() %>% assign_group()
res = compareGroups(group ~ ., data = df)
print(res['mpg'])

But I hope the first two points I made get you closer to an answer.

  • 1
    Well, the `dataframe` becomes the first argument by default, but that is the point of the `.` or `.data` notation. In any case, this is not only an issue with `compareGroups`, which I have used as a MWE here. The following (common workflow) will lead to the same: `get_results <- function(){ get_cars() %>% assign_group %>% do(mod = lm(mpg ~ TestControl, data = .)) }; out <- get_results(); update(out[["mod"]][[1]], formula = mpg ~ TestControl + cyl)`. So, this should be a reasonably common problem. – coffeinjunky May 15 '19 at 14:00
  • Points taken. But I still think the problem might not have anything to do with the `.` in dplyr. This example does not work either due to scoping: `get_cars <- function(){return(mtcars)}; get_results <- function(){foo = get_cars() %>% assign_group(); return(do(foo, mod = lm(mpg ~ disp, data = foo)))}; out <- get_results();update(out[["mod"]][[1]], formula = mpg ~ disp + cyl)`, giving a `Error in is.data.frame(data) : object 'foo' not found` – Peter Smittenaar May 17 '19 at 15:20