0

EDIT: it looks like the order(order()) solution from R: Rank-function with two variables and ties.method random still works.

zed = structure(list(sim_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
    group_id = c(225400, 225400, 225400, 225400, 225401, 225401, 
    225401, 225401, 225402, 225402, 225402, 225402, 225403, 225403, 
    225403, 225403, 225404, 225404, 225404, 225404, 225405, 225405, 
    225405, 225405), feed_id = c(18658, 18708, 18721, 18716, 
    18743, 18570, 18583, 18702, 18694, 18624, 18643, 18689, 18645, 
    18718, 18588, 18706, 18564, 18710, 18648, 18699, 18660, 18647, 
    18701, 18732), points = c(9, 4, 4, 0, 9, 3, 3, 3, 9, 6, 3, 
    0, 5, 5, 4, 1, 7, 5, 3, 1, 6, 5, 4, 1), goal_diff = c(7, 
    -1, 1, -7, 4, 0, -2, -2, 4, -1, 1, -4, 1, 2, -1, -2, 1, 0, 
    0, -1, 1, 1, 1, -3)), row.names = c(NA, -24L), class = c("tbl_df", 
"tbl", "data.frame"))


> head(zed, 8)
# A tibble: 8 x 5
  sim_id group_id feed_id points goal_diff
   <int>    <dbl>   <dbl>  <dbl>     <dbl>
1      1   225400   18658      9         7
2      1   225400   18708      4        -1
3      1   225400   18721      4         1
4      1   225400   18716      0        -7
5      1   225401   18743      9         4
6      1   225401   18570      3         0
7      1   225401   18583      3        -2
8      1   225401   18702      3        -2

We have this data for a major soccer tournament and we need to rank teams within group_id. Currently we are ranking as such:

zed %>% 
  group_by(sim_id, group_id) %>% 
  mutate(place = rank(-points, ties.method = 'random'))

This ranks teams based on points, and breaks ties randomly. We would like to rank teams (feed_id represents the teams) within each group, first using points, then break ties by ranking on goal_diff, and then only after that break ties randomly. What is the best way to do this, preferably within a dplyr chain but open to other solutions as well.

EDIT: expected output for the top 8 rows would be:

> head(zed, 8)
# A tibble: 8 x 5
  sim_id group_id feed_id points goal_diff  place
   <int>    <dbl>   <dbl>  <dbl>     <dbl> 
1      1   225400   18658      9         7      1
2      1   225400   18708      4        -1      3
3      1   225400   18721      4         1      2
4      1   225400   18716      0        -7      4
5      1   225401   18743      9         4      1
6      1   225401   18570      3         0      2
7      1   225401   18583      3        -2   3or4
8      1   225401   18702      3        -2   3or4
  • In group 225400, goal_diff breaks the tie and feed_id == 18721 gets 2nd place.
  • In group 225401, goal_diff breaks the tie and feed_id == 18570 gets 2nd place. 3rd and 4th is random because both are tied on both points and goal_diff
Canovice
  • 9,012
  • 22
  • 93
  • 211
  • yep give me one moment - expected output added! – Canovice Jun 21 '21 at 17:21
  • 1
    `mutate(place = dense_rank(-points))` doesn't incorporate the `goal_diff` column, which we need to use to break ties – Canovice Jun 21 '21 at 17:24
  • Maybe you meant `zed %>% group_by(sim_id, group_id) %>% mutate(place = rank(-points, ties.method = 'random'), place2 = rank(-goal_diff, ties.method = 'random')) %>% group_by(goal_diff, .add = TRUE) %>% mutate(place = if(n() > 1) place2 else place) %>% ungroup %>% mutate(place2 = NULL)` – akrun Jun 21 '21 at 17:34

0 Answers0