Select multiple max values in dplyr

Question

So I have a dataframe as follows:

date          val1  val2
01-01-2020      28    29
01-02-2020      24    23
01-01-2021     192   192
01-02-2021     132   142
01-03-2021     192   131
01-01-2022      21    29
01-02-2022      22    29

All I want to do is determine the highest val1 for each YEAR found in date. I attempt to do so as follows. The problem however is there is an instance of a tie in 2021.

I would like to return both the date and the value, and in the instance of a tie, return both dates:

df %>% 
  mutate(year = year(date)) %>% 
  group_by(year) %>% 
  mutate(val1 = max(val1)) %>% 
  ungroup(year) %>% 
  distinct(year, .keep_all=TRUE) %>% 
  select(date, val1)

The above will only select one of the dates, but I need both....

Final expected output:

date          val1
01-01-2020      28
01-01-2021     192
01-03-2021     192
01-02-2022      22

Actually probably this one, but looking at the dplyr solutions. Either way they might help: https://stackoverflow.com/questions/24558328/select-the-row-with-the-maximum-value-in-each-group — , May 31 '22 at 18:20

score 2 · Answer 1 · answered May 31 '22 at 18:13

A possible solution:

library(dplyr)
library(lubridate)

df %>% 
  group_by(date %>% dmy %>% year) %>% 
  slice_max(val1) %>% 
  ungroup %>% 
  select(date, val1)

#> # A tibble: 4 × 2
#>   date        val1
#>   <chr>      <int>
#> 1 01-01-2020    28
#> 2 01-01-2021   192
#> 3 01-03-2021   192
#> 4 01-02-2022    22

score 0 · Answer 2 · answered May 31 '22 at 18:13

0

Something like this should probably work:

library(dplyr)
df %>%
  group_by(tmp = stringr::word(date, 3, sep = '-')) %>%
  slice_max(val1) %>%
  select(-tmp)

answered May 31 '22 at 18:13

Baraliuh

2,009
5
11

Select multiple max values in dplyr

2 Answers2