How can I write a tidyverse-friendly function that respects group_by() earlier in the pipe?

Question

I've started working on writing functions to make table generation quicker, but want to make the function respect earlier grouping choices made by the user in the pipe.

Example data:

df<-data.frame(ID=c("A","B","C","A","C","D","A","C","E","B","C","A"),
           Year=c(1,1,1,2,2,2,3,3,3,4,4,4),
           Credits=c(1,3,4,5,6,7,2,1,1,6,1,2),
           Major=c("GS","GS","LA","GS","GS","LA","GS","LA","LA","GS","LA","LA"),
           Status=c("green","blue","green","blue","green","blue","green","blue","green","blue","green","blue"),
           Group=c("Art","Music","Science","Art","Music","Science","Art","Music","Science","Art","Music","Science"))

The following is the function I'm working on, and it requires/accepts a variable to define cohorts, a credit variable, and a term variable.

table_headsfte_cohorts<-function(.data,cohortvar,credits,term){


  cohortvar<-rlang::ensym(cohortvar)
  credits<-rlang::ensym(credits)
  term<-rlang::ensym(term)


  .data%>%
    group_by(!!term,Pidm)%>%
    group_by(!!term,!!cohortvar,group_cols())%>%
    mutate(on3=1)%>%
    mutate(`Headcount`=sum(on3),
          `FTE`=round(sum(na.omit(!!credits))/15,1))%>%
    mutate(Variable=paste0(cohortvar))%>%
    mutate(Category=!!cohortvar)%>%
    select(-!!cohortvar)%>%
    select(Variable,Category,Headcount,FTE,group_cols())
}

For a user that may be interested in using additional grouping variables beyond the cohort variable they choose, I am hoping that the end result function would allow usage as follows:

df2<-df%>%
 group_by(Status,Group)%>%
 table_headsfte_cohorts(Major,Credits,Year)

The desired end result would be a table that respects and preserves the levels of the two grouping variables in the group_by statement above in addition to the cohortvar and term columns coming from the table_headsfte_cohorts() arguments.

I need to generate this same table, but for a wide range of grouping variables, and varying numbers of grouping variables, so flexibility would be very helpful.

Edit:

The following seems to get close, by at least allowing multiple grouping variables. This isn't quite what I'm hoping for, as I'd prefer that the additional grouping arguments are read from up the pipe:

 table_headsfte_cohorts<-function(.data,cohortvar,credits,term,...){

  grps<-enquos(...)

  cohortvar<-rlang::ensym(cohortvar)
  credits<-rlang::ensym(credits)
  term<-rlang::ensym(term)


  .data%>%
      group_by(!!term,!!cohortvar,!!! grps)%>%
     mutate(on3=1)%>%
     mutate(`Headcount`=sum(on3),
          `FTE`=round(sum(na.omit(!!credits))/15,1))%>%
     mutate(Variable=paste0(cohortvar))%>%
     mutate(Category=!!cohortvar)%>%
     select(-!!cohortvar)%>%
     select(Variable,Category,Headcount,FTE,!!!grps)

}

Using the above, I can successfully run:

fdfout<-fdf%>%
table_headsfte_cohorts(Major, Credits, Year), getting:

and I can also pass the other variables to the function to serve as additional grouping variables:

fdfout_alt<-fdf%>%
  table_headsfte_cohorts(Major,Credits,Year,Status,Group)

yielding the desired result:

Unfortunately, when I use

fdf_no<-fdf%>%
  group_by(Status, Group)%>%
  table_headsfte_cohorts(Major, Credits, Year)

I get:

This output would likely confuse someone using my function, as their group_by() line seems to do nothing.

[See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with. It's hard to know exactly what you're trying to do and where you might be getting stuck without sample data that recreates your setup — camille, Feb 22 '21 at 19:17
@camille Thank you for the suggestion. I've updated the question and hopefully made it more clear. If you have a chance to look and find it's still not clear please let me know what might be missing. — Pake, Feb 22 '21 at 21:16

score 2 · Accepted Answer · answered Feb 22 '21 at 22:16

I added some lines that will merge both, the existing grouping variables and the new grouping variables inside the dots into one character vector. We can get the existing grouping variablers with group_vars. To merge old and new together we have to get the expression get_expr of the quoted grouping variables and turn them into strings. We can use !!! syms to evaluate and all_of to select the grouping variables.

Is this what you had in mind?

table_headsfte_cohorts <- function(.data, cohortvar, credits, term, ...){
  
  new_grps <- enquos(...)
  new_grps <- purrr::map_chr(new_grps, ~ as.character(rlang::get_expr(.x)))
  ex_grps  <- group_vars(.data)
  grp_vars <- c(ex_grps, new_grps)

  cohortvar<-rlang::ensym(cohortvar)
  credits<-rlang::ensym(credits)
  term<-rlang::ensym(term)
  
  
  .data%>%
    group_by(!! term,
             !! cohortvar,
             !!! syms(grp_vars))%>%
    mutate(on3 = 1) %>%
    mutate(`Headcount`= sum(on3),
           `FTE`= round(sum(na.omit(!!credits))/15,1))%>%
    mutate(Variable=paste0(cohortvar))%>%
    mutate(Category=!!cohortvar)%>%
    select(-!!cohortvar)%>%
    select(Variable,Category,Headcount,FTE, all_of(grp_vars))
  
}

df %>%
  group_by(Status, Group) %>%
  table_headsfte_cohorts(Major, Credits, Year)

#> Adding missing grouping variables: `Major`
#> Adding missing grouping variables: `Year`, `Major`
#> # A tibble: 12 x 8
#> # Groups:   Year, Major, Status, Group [12]
#>     Year Major Variable Category Headcount   FTE Status Group  
#>    <dbl> <chr> <chr>    <chr>        <dbl> <dbl> <chr>  <chr>  
#>  1     1 GS    Major    GS               1   0.1 green  Art    
#>  2     1 GS    Major    GS               1   0.2 blue   Music  
#>  3     1 LA    Major    LA               1   0.3 green  Science
#>  4     2 GS    Major    GS               1   0.3 blue   Art    
#>  5     2 GS    Major    GS               1   0.4 green  Music  
#>  6     2 LA    Major    LA               1   0.5 blue   Science
#>  7     3 GS    Major    GS               1   0.1 green  Art    
#>  8     3 LA    Major    LA               1   0.1 blue   Music  
#>  9     3 LA    Major    LA               1   0.1 green  Science
#> 10     4 GS    Major    GS               1   0.4 blue   Art    
#> 11     4 LA    Major    LA               1   0.1 green  Music  
#> 12     4 LA    Major    LA               1   0.1 blue   Science

This functions as I'd like, and preserves the `group_by`. I don't need the `...` to accept more grouping variables. I ended up taking `ex_grps <- group_vars(.data)` and changing it to `grp_vars<-group_vars(.data)`, omitting the `new_grps` in my use case. — Pake, Feb 23 '21 at 16:27

How can I write a tidyverse-friendly function that respects group_by() earlier in the pipe?

1 Answers1