3

I want to pass weights to glm() via a function without having to use the eval(substitute()) or do.call() methods, but using rlang.

This describes a more complicated underlying function.

# Toy data
mydata = dplyr::tibble(outcome = c(0,0,0,0,0,0,0,0,1,1,1,1,1,1),
                                group = c(0,1,0,1,0,1,0,1,0,1,0,1,0,1),
                                wgts = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1)
)

# This works
glm(outcome ~ group, data = mydata)                             

# This works
glm(outcome ~ group, data = mydata, weights = wgts)                             

library(rlang)
# Function not passing weights
myglm <- function(.data, y, x){
    glm(expr(!! enexpr(y) ~ !! enexpr(x)), data = .data)
}

# This works
myglm(mydata, outcome, group)

# Function passing weights
myglm2 <- function(.data, y, x, weights){
    glm(expr(!! enexpr(y) ~ !! enexpr(x)), `weights = !! enexpr(weights)`, data = .data)
}

# This doesn't work
myglm2(mydata, outcome, group, wgts)

(Ticks are to highlight).

I know the weights argument here is wrong, I have tried many different ways of doing this all unsuccessfully. The actual function will be passed to a version of purrr:map() or purrr:invoke(), which is why I want to avoid a simple do.call(). Thoughts greatly appreciated.

Ewen
  • 1,276
  • 8
  • 13

1 Answers1

3

The issue is that glm() can recognize an expression being provided to its weights argument, but doesn't support quasiquotation, because it uses the base quote() / substitute() / eval() mechanisms instead of rlang. This causes problems for nested expression arithmetic.

One way to get around it is to compose the entire glm expression, then evaluate it. You can use ... to supply optional arguments.

myglm2 <- function( .data, y, x, weights, ... ) {
  myglm <- expr( glm(!!enexpr(y) ~ !!enexpr(x), data=.data, 
                      weights = !!enexpr(weights), ...) )
  eval(myglm)
}

myglm2(mydata, outcome, group)
# Call:  glm(formula = outcome ~ group, data = .data)

myglm2(mydata, outcome, group, wgts)
# Call:  glm(formula = outcome ~ group, data = .data, weights = wgts)

myglm2(mydata, outcome, group, wgts, subset=7:10)
# Call:  glm(formula = outcome ~ group, data = .data, weights = wgts, 
#     subset = ..1)
# While masked as ..1, the 7:10 is nevertheless correctly passed to glm()

To follow @lionel's suggestion, you can encapsulate the expression composition / evaluation into a standalone function:

value <- function( e ) {eval(enexpr(e), caller_env())}

myglm2 <- function( .data, y, x, weights, ... ) {
  value( glm(!!enexpr(y) ~ !!enexpr(x), data=.data, 
              weights = !!enexpr(weights), ...) )
}
Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
  • Thank you Artem, that's great and gets me some way. Can you think of a way to make arguments optional in the same framework. Along the lines of this (not working): `myglm2 <- function( .data, y, x, ...) { args = list2(...) myglm <- expr( glm(!!enexpr(y) ~ !!enexpr(x), data=.data, !!! args) ) eval_tidy(myglm) } myglm2(mydata, outcome, group, weights = wgts)` – Ewen Feb 01 '19 at 16:15
  • 1
    I would extract the expr + eval thing in a function (e.g. `value <- function(expr) eval(enexpr(expr))`). Then you can do `value(glm(!!enexpr(y) ~ !!enexpr(x))` which is a bit nicer. No need for `eval_tidy()` because `glm()` won't handle quosures well. – Lionel Henry Feb 04 '19 at 08:12
  • 1
    Thanks for the suggestion, @lionel. `function(e) {eval(enexpr(e))}` was giving me `cannot coerce class ‘"rlang_fake_data_pronoun"’ to a data.frame` error. I was able to get around it with using `eval_tidy(enquo(e))` instead. – Artem Sokolov Feb 04 '19 at 16:33
  • 1
    Hmm... This error is very surprising, perhaps you made a typo? By the way, just pass the dots directly, don't use `!!!enexprs(...)`. It is especially bad to splice dots captured to bare expressions rather than quosures because you lose the correct environments. If you pass the dots directly, it looks cleaner and is more robust and accurate. – Lionel Henry Feb 04 '19 at 17:16
  • 1
    @lionel: I think the `rlang_fake_data_pronoun` error is related to `.data`, whose scope is limited to `myglm2()`, so `eval()` inside `value()` sees the `rlang::.data` pronoun instead. `enquo()` addresses it effectively, by capturing `myglm2`'s environment. Regarding, `!!!enexprs(...)`, I agree that it's generally cleaner to pass `...` directly. However, it doesn't seem to work with `glm()` and results in `Object wgts not found` error. – Artem Sokolov Feb 04 '19 at 17:37
  • 2
    oops, you're totally right, I forgot to forward the caller env :s. The proper function is `value <- function(e) eval(enexpr(e), caller_env())`. As for `enexprs()`, it is still a bad idea to pass dots this way, you'll get a broken function in many cases. I would add a `weights` argument to `myglm2()` and quote/unquote it within the `glm()` call. And then pass the dots normally. – Lionel Henry Feb 04 '19 at 17:48
  • 3
    Thanks, @lionel. I updated the answer with all your suggestions. – Artem Sokolov Feb 04 '19 at 18:03