I'm just starting to use dplyr and have been converting some of my plyr code over. I love the new syntax, but I'm having trouble getting mutate() to apply a function on a column by group, e.g.:
library(Hmisc)
library(plyr)
library(dplyr)
t1 <- ddply(mtcars, .(cyl), transform, qrt=cut2(wt, g=4))
levels(t1$qrt) # works: different quartiles for each group
t2 <- mtcars %>% group_by(cyl) %>% mutate(qrt=cut2(wt, g=4))
levels(t2$qrt) # doesn't work: uses only 4 quartiles from the first group
At first I thought that the second example was using the entire wt
column instead of the cyl
groups, but it actually appears to be using only the quartiles for the first group and assigning them to all the groups even when the wt
falls outside the range.
Am I missing some syntax around the wt
reference inside the function in mutate? I can get the ddply version to work for functions like sum()
, so is there something about cut2()
that is causing the problem?
I've read quite a few posts on what could be similar issues and have tried running the dplyr version in a clean environment without ddply loaded, among other things.
Thanks for your help!