Dplyr: Filter a pairwise grouped dataset keeping only one row from each pair by condition

Question

I have a dataset like the one below:

game                  teams              score
Liverpool - Leeds     Liverpool          2
Liverpool - Leeds     Leeds              1
Sheffield - Norwich   Sheffield          0
Sheffield - Norwich   Norwich            1
Arsenal - Newcastle   Arsenal            3
Arsenal - Newcastle   Newcastle          2

My purpose is to keep only the rows corresponding to the winning team. My desired outpout would look like this:

game                  teams              score
Liverpool - Leeds     Liverpool          2
Sheffield - Norwich   Norwich            1
Arsenal - Newcastle   Arsenal            3

I guess I would need to group the rows by game firstly, but once done I'm not being able to move forward successfully.

score 4 · Answer 1 · answered Jul 15 '20 at 16:45

One possible dplyr solution would be

df %>% 
  group_by(game) %>% 
  filter(score == max(score))

# A tibble: 6 x 3
# Groups:   game [5]
#   game                teams     score
#   <chr>               <chr>     <int>
# 1 Liverpool - Leeds   Liverpool     2
# 2 Sheffield - Norwich Norwich       1
# 3 Arsenal - Newcastle Arsenal       3

With this solution, you will get both teams in case of a draw.

Allan Cameron · Answer 2 · 2020-07-15T18:11:45.497

2

You can try:

df %>% 
  group_by(game) %>%
   summarize(teams = teams[which.max(score)], score = score[which.max(score)])
#> # A tibble: 3 x 3
#>   game                teams     score
#>   <chr>               <chr>     <int>
#> 1 Arsenal - Newcastle Arsenal       3
#> 2 Liverpool - Leeds   Liverpool     2
#> 3 Sheffield - Norwich Norwich       1

I'm not sure how you plan to handle draws though...

edited Jul 15 '20 at 18:11

answered Jul 15 '20 at 16:33

Allan Cameron

147,086
7
49
87

MarBlo · Answer 3 · 2020-07-15T16:41:21.310

I have added 2 further hypothetical plays, from which one is a tie. In this case both are named with my solution

library(tidyverse)

df <- read.table(text = 
"
game                  teams           score
Liverpool-Leeds     Liverpool          2
Liverpool-Leeds     Leeds              1
Sheffield-Norwich   Sheffield          0
Sheffield-Norwich   Norwich            1
Arsenal-Newcastle   Arsenal            3
Arsenal-Newcastle   Newcastle          2
Tottnham-Bvb        Tottenham         5
Tottnham-Bvb        BvB               6
Bayer-Real          Bayern            3
Bayer-Real          real              3
", header =T)

df
#>                 game     teams score
#> 1    Liverpool-Leeds Liverpool     2
#> 2    Liverpool-Leeds     Leeds     1
#> 3  Sheffield-Norwich Sheffield     0
#> 4  Sheffield-Norwich   Norwich     1
#> 5  Arsenal-Newcastle   Arsenal     3
#> 6  Arsenal-Newcastle Newcastle     2
#> 7       Tottnham-Bvb Tottenham     5
#> 8       Tottnham-Bvb       BvB     6
#> 9         Bayer-Real    Bayern     3
#> 10        Bayer-Real      real     3


df %>% 
  group_by(game) %>% 
  top_n(1)
#> Selecting by score
#> # A tibble: 6 x 3
#> # Groups:   game [5]
#>   game              teams     score
#>   <fct>             <fct>     <int>
#> 1 Liverpool-Leeds   Liverpool     2
#> 2 Sheffield-Norwich Norwich       1
#> 3 Arsenal-Newcastle Arsenal       3
#> 4 Tottnham-Bvb      BvB           6
#> 5 Bayer-Real        Bayern        3
#> 6 Bayer-Real        real          3

score 2 · Accepted Answer · answered Jul 15 '20 at 17:21

We can use slice_max

library(dplyr)
df %>%
     group_by(game) %>%
     slice_max(score)
# A tibble: 6 x 3
# Groups:   game [5]
#  game              teams     score
#  <chr>             <chr>     <int>
#1 Arsenal-Newcastle Arsenal       3
#2 Bayer-Real        Bayern        3
#3 Bayer-Real        real          3
#4 Liverpool-Leeds   Liverpool     2
#5 Sheffield-Norwich Norwich       1
#6 Tottnham-Bvb      BvB           6

data

df1 <- structure(list(game = c("Liverpool-Leeds", "Liverpool-Leeds", 
"Sheffield-Norwich", "Sheffield-Norwich", "Arsenal-Newcastle", 
"Arsenal-Newcastle", "Tottnham-Bvb", "Tottnham-Bvb", "Bayer-Real", 
"Bayer-Real"), teams = c("Liverpool", "Leeds", "Sheffield", "Norwich", 
"Arsenal", "Newcastle", "Tottenham", "BvB", "Bayern", "real"), 
    score = c(2L, 1L, 0L, 1L, 3L, 2L, 5L, 6L, 3L, 3L)),
    class = "data.frame", row.names = c(NA, 
-10L))

Dplyr: Filter a pairwise grouped dataset keeping only one row from each pair by condition

4 Answers4

data