Function on column in dplyr mutate after group_by not giving correct grouped results

Question

I'm just starting to use dplyr and have been converting some of my plyr code over. I love the new syntax, but I'm having trouble getting mutate() to apply a function on a column by group, e.g.:

library(Hmisc)
library(plyr)
library(dplyr)

t1 <- ddply(mtcars, .(cyl), transform, qrt=cut2(wt, g=4))  
levels(t1$qrt)  # works: different quartiles for each group

t2 <- mtcars %>% group_by(cyl) %>% mutate(qrt=cut2(wt, g=4))  
levels(t2$qrt)   # doesn't work: uses only 4 quartiles from the first group

At first I thought that the second example was using the entire wt column instead of the cyl groups, but it actually appears to be using only the quartiles for the first group and assigning them to all the groups even when the wt falls outside the range.

Am I missing some syntax around the wt reference inside the function in mutate? I can get the ddply version to work for functions like sum(), so is there something about cut2() that is causing the problem?

I've read quite a few posts on what could be similar issues and have tried running the dplyr version in a clean environment without ddply loaded, among other things.

Thanks for your help!

Your code works for me using the [development version](https://github.com/hadley/dplyr) of *dplyr*. — aosmith, Mar 24 '16 at 16:38

score 0 · Answer 1 · answered Jan 14 '20 at 18:50

It seems to work with the latest version of dplyr, 0.8.3. The outputed data frame shows that each cyl has its own set of ranges and the min and max wt in each bin do not fall outside the stated range.

remove.packages("dplyr") # Unecessary but proves that this is the latest deployment version
install.packages("dplyr")

packageVersion("dplyr")

# [1] ‘0.8.3’

library(Hmisc)
library(dplyr)

t2 <- mtcars %>% 
  group_by(cyl) %>% 
  mutate(qrt = cut2(wt, g=4))

t2 %>%
  group_by(cyl, qrt) %>%
  summarize(min = min(wt), max = max(wt)) %>%
  arrange(cyl, qrt)

# A tibble: 12 x 4
# Groups:   cyl [3]
#     cyl qrt           min   max
#   <dbl> <chr>       <dbl> <dbl>
# 1     4 [1.51,1.94)  1.51  1.84
# 2     4 [1.94,2.32)  1.94  2.2 
# 3     4 [2.32,3.15)  2.32  2.78
# 4     4 [3.15,3.19]  3.15  3.19
# 5     6 [2.62,2.88)  2.62  2.77
# 6     6 [2.88,3.44)  2.88  3.22
# 7     6 3.44         3.44  3.44
# 8     6 3.46         3.46  3.46
# 9     8 [3.17,3.57)  3.17  3.52
#10     8 [3.57,3.78)  3.57  3.73
#11     8 [3.78,5.25)  3.78  4.07
#12     8 [5.25,5.42]  5.25  5.42

Function on column in dplyr mutate after group_by not giving correct grouped results

1 Answers1