4

How to make a user-defined function work nicely with pipes and group_by? Here is a simple function:

 library(tidyverse)

 fun_head <- function(df, column) {
 column <- enquo(column)
 df %>% select(!!column) %>% head(1)
 }

The function works nicely with pipes and allows to filter by another column:

 mtcars %>% filter(cyl == 4) %>% fun_head(mpg)

 >    mpg
   1 22.8

However, the same pipe-work fails with group_by

mtcars %>% group_by(cyl) %>% fun_head(mpg)

Adding missing grouping variables: `cyl`
# A tibble: 1 x 2
# Groups:   cyl [1]
     cyl   mpg
     <dbl> <dbl>
1     6    21

Using "do" after group_by makes it work:

 > mtcars %>% group_by(cyl) %>% do(fun_head(., mpg))
 # A tibble: 3 x 2
 # Groups:   cyl [3]
    cyl   mpg
   <dbl> <dbl>
1     4  22.8
2     6  21  
3     8  18.7

How should the function be changed so that it works uniformly with filter and group_by without needing "do"?
Or quosures have nothing do with the question, and group_by just requires using "do" because the function in the example has multiple arguments?

Irakli
  • 959
  • 1
  • 11
  • 18
  • Note that `mtcars %>% group_by(cyl) %>% select(mpg) %>% head(1)` also gives you just the first row. – Phil Oct 21 '18 at 02:31

2 Answers2

3

As you've written it, the function selects column from df, then takes head, which is the first row of df (head is not a tidyverse function, and isn't aware of any grouping). dplyr::slice(1) takes the first row of each group, which is what you want. You can use

 fun_head <- function(df, column) {
 column <- enquo(column)
 df %>% slice(1) %>% select(!!column)
 }

 mtcars %>% group_by(cyl) %>% fun_head(mpg)

# # A tibble: 3 x 2
# # Groups:   cyl [3]
#     cyl   mpg
#   <dbl> <dbl>
# 1     4  22.8
# 2     6  21  
# 3     8  18.7
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • Does such approach happens to work here because of 'head' or it is more general? The answer given by Artem Sokolov suggests that "do" is needed with group_by – Irakli Oct 21 '18 at 02:42
  • 1
    There are some other functions like `head`, which have tidyverse equivalents, like `slice`. But there are some which don't, so sometimes you need to use `do`. As mentioned in the other answer "`do` is the connector that allows you to apply *arbitrary* functions to each group", emphasis mine. – IceCreamToucan Oct 21 '18 at 02:47
3

This is independent of quosures. Here's the same issue in the absence of non-standard evaluation in fun_head():

fun_head <- function(df) {df %>% select(mpg) %>% head(1)}
mtcars %>% group_by( cyl ) %>% fun_head()
# Adding missing grouping variables: `cyl`
# # A tibble: 1 x 2
# # Groups:   cyl [1]
#     cyl   mpg
#   <dbl> <dbl>
# 1     6    21

As explained in other questions here and here, do is the connector that allows you to apply arbitrary functions to each group. The reason dplyr verbs such as mutate and filter don't require do is because they handle grouped data frames internally as special cases (see, e.g., the implementation of mutate). If you want your own function to emulate this behavior, you would need to distinguish between grouped and ungrouped data frames:

fun_head2 <- function( df )
{
  if( !is.null(groups(df)) )
    df %>% do( fun_head2(.) )
  else
    df %>% select(mpg) %>% head(1)
}

mtcars %>% group_by(cyl) %>% fun_head2()
# # A tibble: 3 x 2
# # Groups:   cyl [3]
#     cyl   mpg
#   <dbl> <dbl>
# 1     4  22.8
# 2     6  21  
# 3     8  18.7

EDIT: I want to point out that another alternative to group_by + do is to use tidyr::nest + purrr::map instead. Going back to your original function definition that takes two arguments:

fhead <- function(.df, .var) { .df %>% select(!!ensym(.var)) %>% head(1) }

The following two chains are equivalent (up to an ordering of rows, since group_by sorts by the grouping variable and nest doesn't):

# Option 1: group_by + do
mtcars %>% group_by(cyl) %>% do( fhead(., mpg) ) %>% ungroup

# Option 2: nest + map
mtcars %>% nest(-cyl) %>% mutate_at( "data", map, fhead, "mpg" ) %>% unnest
Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74