1

I have this data:

df <- data.frame(
  node = c("A", "B", "A", "A", "A", "B", "A", "A", "A", "B", "B", "B", "B"),
  left = c("ab", "ab", "ab", "ab", "cc", "xx", "cc", "ab", "zz", "xx", "xx", "zz", "zz")
)

I want to count grouped frequencies and proportions and slice/filter out a sequence of grouped rows. Say, given the small dataset, I want to have the rows with the two highest Freq_left values per group. How can that be done? I can only extract the rows with the maximum Freq_left values but not the desired sequence of rows:

df %>%
  group_by(node, left) %>%
  # summarise
  summarise(
    Freq_left = n(),
    Prop_left = round(Freq_left/nrow(.)*100, 4)
    ) %>%
  slice_max(Freq_left)
# A tibble: 2 × 4
# Groups:   node [2]
  node  left  Freq_left Prop_left
  <chr> <chr>     <int>     <dbl>
1 A     ab            4      30.8
2 B     xx            3      23.1

Expected output:

  node  left  Freq_left Prop_left
  <chr> <chr>     <int>     <dbl>
  A     ab            4     30.8 
  A     cc            2     15.4 
  B     xx            3     23.1 
  B     zz            2     15.4
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • `slice_max` takes an `n` argument: [Getting the top values by group](https://stackoverflow.com/questions/27766054/getting-the-top-values-by-group) (`top_n` is superseded) – Henrik Dec 05 '21 at 17:23

1 Answers1

2

You could use dplyr::top_n or dplyr::slice_max:

Thanks to @PaulSmith for pointing out that dplyr::top_n is superseded in favor of dplyr::slice_max:

library(dplyr)

df %>%
  group_by(node, left) %>%
  # summarise
  summarise(
    Freq_left = n(),
    Prop_left = round(Freq_left/nrow(.)*100, 4)
  ) %>%
  slice_max(order_by = Prop_left, n = 2)
#> `summarise()` has grouped output by 'node'. You can override using the `.groups` argument.
#> # A tibble: 4 × 4
#> # Groups:   node [2]
#>   node  left  Freq_left Prop_left
#>   <chr> <chr>     <int>     <dbl>
#> 1 A     ab            4      30.8
#> 2 A     cc            2      15.4
#> 3 B     xx            3      23.1
#> 4 B     zz            2      15.4
stefan
  • 90,330
  • 6
  • 25
  • 51
  • 1
    Since `top_n` is now superseded, I would suggest using `slice_max(order_by = Prop_left, n=2)`. – PaulS Dec 05 '21 at 17:26