3

I have a dataset like the one below:

game                  teams              score
Liverpool - Leeds     Liverpool          2
Liverpool - Leeds     Leeds              1
Sheffield - Norwich   Sheffield          0
Sheffield - Norwich   Norwich            1
Arsenal - Newcastle   Arsenal            3
Arsenal - Newcastle   Newcastle          2

My purpose is to keep only the rows corresponding to the winning team. My desired outpout would look like this:

game                  teams              score
Liverpool - Leeds     Liverpool          2
Sheffield - Norwich   Norwich            1
Arsenal - Newcastle   Arsenal            3

I guess I would need to group the rows by game firstly, but once done I'm not being able to move forward successfully.

Ric S
  • 9,073
  • 3
  • 25
  • 51
teogj
  • 289
  • 1
  • 11

4 Answers4

4

One possible dplyr solution would be

df %>% 
  group_by(game) %>% 
  filter(score == max(score))

# A tibble: 6 x 3
# Groups:   game [5]
#   game                teams     score
#   <chr>               <chr>     <int>
# 1 Liverpool - Leeds   Liverpool     2
# 2 Sheffield - Norwich Norwich       1
# 3 Arsenal - Newcastle Arsenal       3

With this solution, you will get both teams in case of a draw.

Ric S
  • 9,073
  • 3
  • 25
  • 51
2

You can try:

df %>% 
  group_by(game) %>%
   summarize(teams = teams[which.max(score)], score = score[which.max(score)])
#> # A tibble: 3 x 3
#>   game                teams     score
#>   <chr>               <chr>     <int>
#> 1 Arsenal - Newcastle Arsenal       3
#> 2 Liverpool - Leeds   Liverpool     2
#> 3 Sheffield - Norwich Norwich       1

I'm not sure how you plan to handle draws though...

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
2

I have added 2 further hypothetical plays, from which one is a tie. In this case both are named with my solution

library(tidyverse)

df <- read.table(text = 
"
game                  teams           score
Liverpool-Leeds     Liverpool          2
Liverpool-Leeds     Leeds              1
Sheffield-Norwich   Sheffield          0
Sheffield-Norwich   Norwich            1
Arsenal-Newcastle   Arsenal            3
Arsenal-Newcastle   Newcastle          2
Tottnham-Bvb        Tottenham         5
Tottnham-Bvb        BvB               6
Bayer-Real          Bayern            3
Bayer-Real          real              3
", header =T)

df
#>                 game     teams score
#> 1    Liverpool-Leeds Liverpool     2
#> 2    Liverpool-Leeds     Leeds     1
#> 3  Sheffield-Norwich Sheffield     0
#> 4  Sheffield-Norwich   Norwich     1
#> 5  Arsenal-Newcastle   Arsenal     3
#> 6  Arsenal-Newcastle Newcastle     2
#> 7       Tottnham-Bvb Tottenham     5
#> 8       Tottnham-Bvb       BvB     6
#> 9         Bayer-Real    Bayern     3
#> 10        Bayer-Real      real     3


df %>% 
  group_by(game) %>% 
  top_n(1)
#> Selecting by score
#> # A tibble: 6 x 3
#> # Groups:   game [5]
#>   game              teams     score
#>   <fct>             <fct>     <int>
#> 1 Liverpool-Leeds   Liverpool     2
#> 2 Sheffield-Norwich Norwich       1
#> 3 Arsenal-Newcastle Arsenal       3
#> 4 Tottnham-Bvb      BvB           6
#> 5 Bayer-Real        Bayern        3
#> 6 Bayer-Real        real          3
MarBlo
  • 4,195
  • 1
  • 13
  • 27
2

We can use slice_max

library(dplyr)
df %>%
     group_by(game) %>%
     slice_max(score)
# A tibble: 6 x 3
# Groups:   game [5]
#  game              teams     score
#  <chr>             <chr>     <int>
#1 Arsenal-Newcastle Arsenal       3
#2 Bayer-Real        Bayern        3
#3 Bayer-Real        real          3
#4 Liverpool-Leeds   Liverpool     2
#5 Sheffield-Norwich Norwich       1
#6 Tottnham-Bvb      BvB           6

data

df1 <- structure(list(game = c("Liverpool-Leeds", "Liverpool-Leeds", 
"Sheffield-Norwich", "Sheffield-Norwich", "Arsenal-Newcastle", 
"Arsenal-Newcastle", "Tottnham-Bvb", "Tottnham-Bvb", "Bayer-Real", 
"Bayer-Real"), teams = c("Liverpool", "Leeds", "Sheffield", "Norwich", 
"Arsenal", "Newcastle", "Tottenham", "BvB", "Bayern", "real"), 
    score = c(2L, 1L, 0L, 1L, 3L, 2L, 5L, 6L, 3L, 3L)),
    class = "data.frame", row.names = c(NA, 
-10L))
akrun
  • 874,273
  • 37
  • 540
  • 662