Groupby multiple columns, sort based on one column then select top n in R

Question

Given a dataframe df as follows:

df <- structure(list(city = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("bj", "sh"
), class = "factor"), district = structure(c(1L, 1L, 1L, 1L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), .Label = c("cy", 
"hd", "hp", "pd"), class = "factor"), value = c(5L, 6L, 3L, 4L, 
2L, 4L, 5L, 2L, 5L, 7L, 8L, 8L, 9L, 9L, 6L, 3L, 2L)), class = "data.frame", row.names = c(NA, 
-17L))

I need to groupby city and district, then sort value column in descending way, and take top 2 for each groups.

The expected result will like this:

city district value
bj  cy  6       
bj  cy  5       
bj  hd  5       
bj  hd  4       
sh  hp  8       
sh  hp  8       
sh  pd  9       
sh  pd  9

How could I do that in R? Thanks.

score 1 · Accepted Answer · answered Nov 23 '20 at 07:43

Does this work:

library(dplyr)
df %>% group_by(city, district) %>% slice_max(value,n = 2)
# A tibble: 8 x 3
# Groups:   city, district [4]
  city  district value
  <fct> <fct>    <int>
1 bj    cy           6
2 bj    cy           5
3 bj    hd           5
4 bj    hd           4
5 sh    hp           8
6 sh    hp           8
7 sh    pd           9
8 sh    pd           9

score 1 · Answer 2 · answered Nov 23 '20 at 09:29

use data.table

library(data.table)
dt <- as.data.table(df)
dt[order(value, decreasing = T), value[1:2], by = c("city", "district")]
#>    city district V1
#> 1:   sh       pd  9
#> 2:   sh       pd  9
#> 3:   sh       hp  8
#> 4:   sh       hp  8
#> 5:   bj       cy  6
#> 6:   bj       cy  5
#> 7:   bj       hd  5
#> 8:   bj       hd  4

^{Created on 2020-11-23 by the reprex package (v0.3.0)}

score 0 · Answer 3 · answered Nov 23 '20 at 08:01

0

Another option based @Karthik S's solution:

dd %>% 
  group_by(city, district) %>% 
  arrange(desc(value)) %>%
  slice_max(value, n = 2) %>%
  ungroup()

answered Nov 23 '20 at 08:01

ah bon

9,293
12
65
148

Groupby multiple columns, sort based on one column then select top n in R

3 Answers3

Linked

Related