1

Extending this answer given by @G. Grothendieck, how can I pass more than one grouping variable to dplyr inside a function?

Let's say I have this data:

# Data
set.seed(1)
dfx <- data.frame(nLive = sample(x = 10, size = 40, replace = TRUE),
                  nDead = sample(x = 3, size = 40, replace = TRUE),              
                  areaA = c(rep("A", 20), rep("B", 20)),
                  areaB = rep( c( rep("yes", 10), rep("no", 10)), 2),
                  year = rep(c(2000,2002,2004,2006,2008),4)
                  )

I want to group by year, and possibly up to 2 other variables.

G. Grothendieck's example works perfectly for specifying 1 index:

UnFun <- function(dat, index) {
  dat %>%
    group_by(year) %>%
    regroup(list(index)) %>%
    summarise(n = n() ) 
}

> UnFun(dfx, "areaA")
Source: local data frame [2 x 2]    
  areaA  n
1     A 20
2     B 20

> UnFun(dfx, "areaB")
Source: local data frame [2 x 2]    
  areaB  n
1    no 20
2   yes 20

But when I try to group by both (or year alone), I get errors or wrong answers:

> UnFun(dfx, list("areaA", "areaB"))
Error: cannot convert to symbol (SYMSXP)

> UnFun(dfx, c("areaA", "areaB"))
Source: local data frame [2 x 2]

  areaA  n
1     A 20
2     B 20

UnFun(dfx, NULL)
Error: cannot convert to symbol (SYMSXP)

Any tips about how to to correctly specify the option of 0, 1 or 2 groups?

Thanks, R Community!

Community
  • 1
  • 1
NotYourIPhone Siri
  • 715
  • 1
  • 8
  • 11
  • The second comment to the answer you linked to shows two ways to do this that seem to work. Looks like you need to use `...` instead of `index` in your function or use `regroup(as.list(index))` instead of `regroup(list(index))`. – aosmith Aug 14 '14 at 18:48
  • @aosmith, thanks for your help! That was an easy fix :) – NotYourIPhone Siri Aug 14 '14 at 20:41

1 Answers1

0

This does work:

UnFun <- function(dat, index) {
  dat %>%
    group_by_(.dots = c(quote(year), index)) %>%
    tally 
}
UnFun(dfx, c("areaA", "areaB"))
NotYourIPhone Siri
  • 715
  • 1
  • 8
  • 11
  • That does not seem to work. I see a result that groups by year and areaA, but not areaB. I think you want `dfx %>% group_by_(.dots=c("year",other_vars)) %>% tally` or even simpler `dfx %>% count_(vars=c("year",other_vars))` – Frank Nov 05 '15 at 18:26
  • No problem. I think you should edit your answer into the best/clearest version possible. No need to show old, wrong parts or write "Correction", "Edit", etc. – Frank Nov 06 '15 at 17:04
  • @Frank - I don't understand why the ".dots" part is necessary in the group_by_ function? Also, thanks for the heads-up on the tally function - new to me – NotYourIPhone Siri Nov 06 '15 at 17:07
  • The usual reference for the `.dots` part is here: https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html I haven't read it myself and just use it blindly :) – Frank Nov 06 '15 at 17:19