It's unclear to me if you want the highest number of cases for each month, or the highest number of cases for each country for each month. I went with the second, but included instructions if you would prefer to do the first.
The short version of my answer is- you were on the right path! I'm not sure what happened- maybe you had the right thing, but it was hard to tell if the dataset was being filtered, without looking at the reduced number of rows which R displays.
# since you didn't provide a sample dataset, we download Our World in Data's COVID-19 dataset and transform it into something similar
country_date_cases <- read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/cases_deaths/new_cases.csv") |>
pivot_longer(-date, names_to = "Country.Region", values_to = "Cases") |>
filter(year(date) == 2020 & between(month(date), 1, 4))
# example dataset
# A tibble: 29,274 × 3
date Country.Region Cases
<date> <chr> <dbl>
1 2020-01-03 World 0
2 2020-01-03 Afghanistan 0
3 2020-01-03 Africa 0
4 2020-01-03 Albania 0
5 2020-01-03 Algeria 0
6 2020-01-03 American Samoa 0
7 2020-01-03 Andorra 0
8 2020-01-03 Angola 0
9 2020-01-03 Anguilla 0
10 2020-01-03 Antigua and Barbuda 0
# ℹ 29,264 more rows
country_date_cases |>
mutate(Month = month(date)) |>
arrange(Month, desc(Cases)) |>
distinct(Country.Region, Month, .keep_all = TRUE) # just using Month gets the largest value for each month, which in my dataset, which includes the world total, is obviously the world total. If we include Country.Region in there too, we get the largest value for each country in each month.
# A tibble: 984 × 4
date Country.Region Cases Month
<date> <chr> <dbl> <dbl>
1 2020-01-31 World 2015 1
2 2020-01-31 Asia 2010 1
3 2020-01-31 Upper middle income 1990 1
4 2020-01-31 China 1984 1
5 2020-01-31 High income 20 1
6 2020-01-27 Thailand 7 1
7 2020-01-25 North America 5 1
8 2020-01-25 United States 5 1
9 2020-01-26 Europe 5 1
10 2020-01-26 European Union 5 1
# ℹ 974 more rows
# ℹ Use `print(n = ...)` to see more rows
# so this is equivalent to slice_max. However, with that said, as Seth has already commented, slice_max is the preferred option, as it is shorter, and more readable
country_date_cases |>
mutate(Month = month(date)) |>
slice_max(Cases, n = 1, by = c(Country.Region, Month), with_ties = FALSE) # with slice_max, we also have the option of including ties, which I didn't do here. If you wanted the Country.Region with the largest number of cases in each month, you could change this to `by = Month`