0

So I have a dataframe as follows:

date          val1  val2
01-01-2020      28    29
01-02-2020      24    23
01-01-2021     192   192
01-02-2021     132   142
01-03-2021     192   131
01-01-2022      21    29
01-02-2022      22    29

All I want to do is determine the highest val1 for each YEAR found in date. I attempt to do so as follows. The problem however is there is an instance of a tie in 2021.

I would like to return both the date and the value, and in the instance of a tie, return both dates:

df %>% 
  mutate(year = year(date)) %>% 
  group_by(year) %>% 
  mutate(val1 = max(val1)) %>% 
  ungroup(year) %>% 
  distinct(year, .keep_all=TRUE) %>% 
  select(date, val1)

The above will only select one of the dates, but I need both....

Final expected output:

date          val1
01-01-2020      28
01-01-2021     192
01-03-2021     192
01-02-2022      22
John Thomas
  • 1,075
  • 9
  • 32
  • Actually probably this one, but looking at the dplyr solutions. Either way they might help: https://stackoverflow.com/questions/24558328/select-the-row-with-the-maximum-value-in-each-group –  May 31 '22 at 18:20

2 Answers2

2

A possible solution:

library(dplyr)
library(lubridate)

df %>% 
  group_by(date %>% dmy %>% year) %>% 
  slice_max(val1) %>% 
  ungroup %>% 
  select(date, val1)

#> # A tibble: 4 × 2
#>   date        val1
#>   <chr>      <int>
#> 1 01-01-2020    28
#> 2 01-01-2021   192
#> 3 01-03-2021   192
#> 4 01-02-2022    22
PaulS
  • 21,159
  • 2
  • 9
  • 26
0

Something like this should probably work:

library(dplyr)
df %>%
  group_by(tmp = stringr::word(date, 3, sep = '-')) %>%
  slice_max(val1) %>%
  select(-tmp)
Baraliuh
  • 2,009
  • 5
  • 11