0

Given a dataframe df as follows:

df <- structure(list(city = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("bj", "sh"
), class = "factor"), district = structure(c(1L, 1L, 1L, 1L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), .Label = c("cy", 
"hd", "hp", "pd"), class = "factor"), value = c(5L, 6L, 3L, 4L, 
2L, 4L, 5L, 2L, 5L, 7L, 8L, 8L, 9L, 9L, 6L, 3L, 2L)), class = "data.frame", row.names = c(NA, 
-17L))

I need to groupby city and district, then sort value column in descending way, and take top 2 for each groups.

The expected result will like this:

city district value
bj  cy  6       
bj  cy  5       
bj  hd  5       
bj  hd  4       
sh  hp  8       
sh  hp  8       
sh  pd  9       
sh  pd  9   

How could I do that in R? Thanks.

ah bon
  • 9,293
  • 12
  • 65
  • 148

3 Answers3

1

Does this work:

library(dplyr)
df %>% group_by(city, district) %>% slice_max(value,n = 2)
# A tibble: 8 x 3
# Groups:   city, district [4]
  city  district value
  <fct> <fct>    <int>
1 bj    cy           6
2 bj    cy           5
3 bj    hd           5
4 bj    hd           4
5 sh    hp           8
6 sh    hp           8
7 sh    pd           9
8 sh    pd           9
Karthik S
  • 11,348
  • 2
  • 11
  • 25
1

use data.table

library(data.table)
dt <- as.data.table(df)
dt[order(value, decreasing = T), value[1:2], by = c("city", "district")]
#>    city district V1
#> 1:   sh       pd  9
#> 2:   sh       pd  9
#> 3:   sh       hp  8
#> 4:   sh       hp  8
#> 5:   bj       cy  6
#> 6:   bj       cy  5
#> 7:   bj       hd  5
#> 8:   bj       hd  4

Created on 2020-11-23 by the reprex package (v0.3.0)

Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14
0

Another option based @Karthik S's solution:

dd %>% 
  group_by(city, district) %>% 
  arrange(desc(value)) %>%
  slice_max(value, n = 2) %>%
  ungroup()
ah bon
  • 9,293
  • 12
  • 65
  • 148