Seems like this is a problem with how dplyr
is setting up the environment to the data.table call. The problem appears in the dplyr:::summarise_.grouped_dt
function. It currently looks like
function (.data, ..., .dots)
{
dots <- lazyeval::all_dots(.dots, ..., all_named = TRUE)
for (i in seq_along(dots)) {
if (identical(dots[[i]]$expr, quote(n()))) {
dots[[i]]$expr <- quote(.N)
}
}
list_call <- lazyeval::make_call(quote(list), dots)
call <- substitute(dt[, list_call, by = vars], list(list_call = list_call$expr))
env <- dt_env(.data, parent.frame())
out <- eval(call, env)
grouped_dt(out, drop_last(groups(.data)), copy = FALSE)
}
<environment: namespace:dplyr>
and if we debug that function and look at the trace when it's called, we see
where 1: summarise_.grouped_dt(.data, .dots = lazyeval::lazy_dots(...))
where 2: summarise_(.data, .dots = lazyeval::lazy_dots(...))
where 3: summarise(., sum.bad = sum(y == bad))
where 4: function_list[[k]](value)
where 5: withVisible(function_list[[k]](value))
where 6: freduce(value, `_function_list`)
where 7: `_fseq`(`_lhs`)
where 8: eval(expr, envir, enclos)
where 9: eval(quote(`_fseq`(`_lhs`)), env, env)
where 10: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
where 11 at #3: z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad))
where 12: f(rnorm(100), rnorm(100) < 0, bad = FALSE)
So the important line is the
env <- dt_env(.data, parent.frame())
one. Here it's setting up the environment path which specifies where to look up all variables in the call. Here it's just using the parent.frame which is looks to where the function was called from, but since you actually jump through a few hoops to get to this function from your summarize
call inside f()
, this doesn't seem to be the right parent frame. If, instead you run
env <- dt_env(.data, parent.frame(2))
in debug mode, that seems to actually get at the correct parent frame. So i think the problem is the jump from summarize()
to summarize_()
because this
ff <- function(x, y, bad) {
z <- data.table(x,y, key = "x")
z2 <- z %>% group_by(x) %>% summarise_(.dots=list(sum.bad = quote(sum(y == bad))))
z2
}
ff(rnorm(100), rnorm(100) < 0, bad = FALSE)
seems to work. So it's really dplyr that needs to set up the correct environment. The tricky part is that appears to be different if you call summarize
or summarize_
directly. Perhaps summarise()
could change the environment when it calls summarise_
to have the same parent.frame via eval()
. But I'd probably file this as a bug report and let Hadley decide how to fix it. Something like
summarise <- function(.data, ...) {
call <- match.call()
call <- as.call(c(as.list(call)[1:2], list(.dots=as.list(call)[-(1:2)])))
call[[1]] <- quote(summarise_)
eval(call, envir=parent.frame())
}
would be a "traditional" way to do it. Not sure if the lazyeval package has nicer ways to do this or not.
Tested with data.table_1.9.2
and dplyr_0.3.0.2