I just ran into some weird behavior of dplyr
where summarize
kept referring to objects from a previous group.
Here is a simple reproducible example to illustrate the surprising behavior:
library(dplyr, warn.conflicts = FALSE)
tibble(x = rep(letters[1:3], times = 4),
y = rnorm(12)) %>%
group_by(x) %>%
summarize(z1 = sum(y),
z2 = {
attr(y, "test") <- "test"
sum(y)
})
#> # A tibble: 3 × 3
#> x z1 z2
#> <chr> <dbl> <dbl>
#> 1 a 0.602 0.602
#> 2 b 1.22 0.602
#> 3 c -0.310 0.602
Created on 2022-10-31 by the reprex package (v2.0.1)
I expected that z1
and z2
are identical. I don't understand why setting an attribute for the vector y
means that in later iterations, the reference to the ''correct'' elements of y
is shadowed.
The problem can be easily fixed by using sum(.data$y)
in the last line, but I would like to understand the scoping rules within the non-standard evaluation of summarize
. Any pointers to helpful documentation or explanations why the current behavior makes sense in the tidyverse non-standard evaluation framework makes sense is appreciated.
I am using R 4.1.1 with dplyr 1.0.7.