How to extract one specific group in dplyr

Question

Given a grouped tbl, can I extract one/few groups? Such function can be useful when prototyping code, e.g.:

mtcars %>%
  group_by(cyl) %>%
  select_first_n_groups(2) %>%
  do({'complicated expression'})

Surely, one can do an explicit filter before grouping, but that can be cumbersome.

http://stackoverflow.com/questions/22182442/dplyr-how-to-apply-do-on-result-of-group-by — KFB, Oct 22 '14 at 08:58
In data.table, you could use `setDT(mtcars)[, .SD[.GRP %in% 1:2],by=cyl]` — akrun, Oct 22 '14 at 09:02
@akrun With that approach I had, I seem to get right outcome with warning messages.`mtcars %>% mutate(cyl = as.factor(cyl)) %>% group_by(cyl) %>% filter(cyl == levels(cyl)[c(1,3)])` I feel funny about this. But, any idea? — jazzurro, Oct 22 '14 at 09:13
@jazzurro Using your code, I am not getting any warning though. I use the devel version of dplyr. Even this `mtcars %>% group_by(cyl) %>% filter(cyl %in% c(4,6))` works. But, I guess the OP do not want to use `filter`. — akrun, Oct 22 '14 at 09:17
@akrun Do you have the most updated version? I think mine is 0.3 which I installed after the official release. I will download the latest and see if I still see error messages. — jazzurro, Oct 22 '14 at 09:20
@akrun Thanks for that. `filter` may be an easy way here. But if that is not what the OP wants, do then? — jazzurro, Oct 22 '14 at 09:27
@jazzurro Perhaps, `do`. Without much details about what the OP wants to ultimately do, it is all guess. — akrun, Oct 22 '14 at 09:30
@akrun, @jazzurro It's very simple. For example, given `by_cyl <- group_by(mtcars, cyl)`, how do I get the `n`th group? — Rosen Matev, Oct 22 '14 at 09:38
@Rosen Matev This is easier as I mentioned above in `data.table` ie. `n <- 3; setDT(by_cyl)[,.SD[.GRP==n], by=cyl]` — akrun, Oct 22 '14 at 09:42

Holger Brandl · Answer 1 · 2020-02-20T09:57:36.173

With a bit of dplyr along with some nesting/unnesting (supported by tidyr package), you could establish a small helper to get the first (or any) group

first = function(x) x %>% nest %>% ungroup %>% slice(1) %>% unnest(data)
mtcars %>% group_by(cyl) %>% first()

By adjusting the slicing you could also extract the nth or any range of groups by index, but typically the first or the last is what most users want.

The name is inspired by functional APIs which all call it first (see stdlibs of i.e. kotlin, python, scala, java, spark).

Edit: Faster Version

A more scalable version (>50x faster on large datasets) that avoids nesting would be

first_group = function(x) x %>%
    select(group_cols()) %>%
    distinct %>%
    ungroup %>%
    slice(1) %>%
    { semi_join(x, .)}

A another positive side-effect of this improved version is that it fails if not grouping is present in x.

I like the "Faster Version". I made the obvious extension to get the nth group, rather than only the first, as I want to look into the middle of my data. I get verbosity `Joining, by = "date"` that I'd like to suppress. Is there a way to do that? — Liam, Jan 24 '23 at 16:10

G. Grothendieck · Accepted Answer · 2014-10-22T13:46:07.697

9

Try this where groups is a vector of group numbers. Here 1:2 means the first two groups:

select_groups <- function(data, groups, ...) 
   data[sort(unlist(attr(data, "indices")[ groups ])) + 1, ]

mtcars %>% group_by(cyl) %>% select_groups(1:2)

The selected rows appear in the original order. If you prefer that the rows appear in the order that the groups are specified (e.g. in the above eaxmple the rows of the first group followed by the rows of the second group) then remove the sort.

edited Oct 22 '14 at 13:46

answered Oct 22 '14 at 11:53

G. Grothendieck

254,981
17
203
341

Thanks. This works with the following caveats. First, only `data.frame` backend is supported. Second, using `data[...]` does the grouping again. However, since the use case suggests that a small number of groups is selected, this should not be a problem. I'll accept the answer as it appears that `dplyr` does not have such built-in functionality. – Rosen Matev Oct 24 '14 at 08:20
@Grothendieck is your solution is valid today? is there something in `dplyr` that directly does that? – ℕʘʘḆḽḘ Jan 11 '17 at 14:00
It seemed to work with dplyr 0.5 (the most recent version on CRAN) when I pasted the code in the answer into R. It gave the rows having cyl = 4 or = 6 (the first two groups) as expected. If it does not work for you then try it again after restarting R from a vanilla state. – G. Grothendieck Jan 11 '17 at 14:39
4

as of 2019, I had to modify the function: `select_groups <- function(dd, gr, ...) dd[sort(unlist(attr(dd, "groups")$.rows[ gr ])), ]` – Bastien Mar 12 '19 at 13:09

score 1 · Answer 3 · answered Aug 10 '23 at 23:40

I know this is an old question, but I was looking for something similar, and then came across this question but then realised this is now much easier since dplyr 1.0, and thought others might also be looking.

You can simple group and filter based on cur_group_id(). If you know the grouping that you are after you could also use cur_group() although arguably might be just as easy to filter on what you want. I can imagine these being useful in combination if you have a heavily grouped data frame and just want the first group with a confirmed match in a category or 2. Would need to be pedantic about what the group order is though in my current example.

library(dplyr)

 starwars %>% group_by(homeworld, species) %>% filter(cur_group_id() == 1)
#> # A tibble: 3 x 14
#> # Groups:   homeworld, species [1]
#>   name  height  mass hair_color skin_color eye_color birth_year sex   gender
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
#> 1 Leia~    150    49 brown      light      brown             19 fema~ femin~
#> 2 Bail~    191    NA black      tan        brown             67 male  mascu~
#> 3 Raym~    188    79 brown      light      brown             NA male  mascu~
#> # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> #   vehicles <list>, starships <list>

 starwars %>% group_by(homeworld, species, eye_color) %>% 
   filter(grepl("Tatooine Human",paste(cur_group(), collapse = " ") )) %>% 
   filter(cur_group_id() == 1)
#> # A tibble: 5 x 14
#> # Groups:   homeworld, species, eye_color [1]
#>   name  height  mass hair_color skin_color eye_color birth_year sex   gender
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
#> 1 Luke~    172    77 blond      fair       blue            19   male  mascu~
#> 2 Owen~    178   120 brown, gr~ light      blue            52   male  mascu~
#> 3 Beru~    165    75 brown      light      blue            47   fema~ femin~
#> 4 Anak~    188    84 blond      fair       blue            41.9 male  mascu~
#> 5 Clie~    183    NA brown      fair       blue            82   male  mascu~
#> # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> #   vehicles <list>, starships <list>

^{Created on 2023-08-11 by the reprex package (v0.3.0)}

How to extract one specific group in dplyr

3 Answers3