35

I'm trying to replace all my plyr calls with dplyr. There are still a few snags and one of them is with the group_by function. I imagine it acts the same way as the second ddply argument and does a split, apply and combine based on the grouping variables I list. But that doesn't appear to be the case. Here is a rather trivial example.

Let's define a silly function

mm <- function(x) return(x[1:5, ])

Now we can split the species in the irisdataset like so and apply this function to each piece.

ddply(iris, .(Species), mm)

This works as intended. However, when I try the same with dplyr, it doesn't work as expected.

iris %>% group_by(Species) %>% mm

What am I doing wrong?

Blaszard
  • 30,954
  • 51
  • 153
  • 233
Maiasaura
  • 32,226
  • 27
  • 104
  • 108

3 Answers3

37

As shown in ?do, you can refer to a group with . in your expression. The following will replicate your ddply output:

iris %>% group_by(Species) %>% do(.[1:5, ])

# Source: local data frame [15 x 5]
# Groups: Species
#
#    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
# 1           5.1         3.5          1.4         0.2     setosa
# 2           4.9         3.0          1.4         0.2     setosa
# 3           4.7         3.2          1.3         0.2     setosa
# 4           4.6         3.1          1.5         0.2     setosa
# 5           5.0         3.6          1.4         0.2     setosa
# 6           7.0         3.2          4.7         1.4 versicolor
# 7           6.4         3.2          4.5         1.5 versicolor
# 8           6.9         3.1          4.9         1.5 versicolor
# 9           5.5         2.3          4.0         1.3 versicolor
# 10          6.5         2.8          4.6         1.5 versicolor
# 11          6.3         3.3          6.0         2.5  virginica
# 12          5.8         2.7          5.1         1.9  virginica
# 13          7.1         3.0          5.9         2.1  virginica
# 14          6.3         2.9          5.6         1.8  virginica
# 15          6.5         3.0          5.8         2.2  virginica

More generally, to apply a custom function to groups with dplyr, you can do something like the following (thanks @docendodiscimus):

iris %>% group_by(Species) %>% do(mm(.))
jbaums
  • 27,115
  • 5
  • 79
  • 119
  • 7
    +1 plus, if the OP wants to use his custom function, he can also do that: `iris %>% group_by(Species) %>% do(mm(.))` or just use `iris %>% group_by(Species) %>% do(head(.,5))` – talat Jun 10 '14 at 08:44
1

do() has been superseded in favour of reframe().

So to do this in the modern up-to-date way, we should use that:

reframe(iris, across(everything(), ~ .x[1:5]), .by = Species) # one liner!

or using slice_head():

slice_head(iris, n = 5, by = Species)
Mark
  • 7,785
  • 2
  • 14
  • 34
0

slice has been created for this :


library(dplyr)
iris %>% group_by(Species) %>% slice(1:5)
#> # A tibble: 15 x 5
#> # Groups:   Species [3]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>     
#>  1          5.1         3.5          1.4         0.2 setosa    
#>  2          4.9         3            1.4         0.2 setosa    
#>  3          4.7         3.2          1.3         0.2 setosa    
#>  4          4.6         3.1          1.5         0.2 setosa    
#>  5          5           3.6          1.4         0.2 setosa    
#>  6          7           3.2          4.7         1.4 versicolor
#>  7          6.4         3.2          4.5         1.5 versicolor
#>  8          6.9         3.1          4.9         1.5 versicolor
#>  9          5.5         2.3          4           1.3 versicolor
#> 10          6.5         2.8          4.6         1.5 versicolor
#> 11          6.3         3.3          6           2.5 virginica 
#> 12          5.8         2.7          5.1         1.9 virginica 
#> 13          7.1         3            5.9         2.1 virginica 
#> 14          6.3         2.9          5.6         1.8 virginica 
#> 15          6.5         3            5.8         2.2 virginica
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167