How to subset a dataframe based on the max of a variable?

Question

I have a dataset that looks something like the table below with multiple lines corresponding to each month, the count of how many sightings of a certain bird there are (n), and in what region that sighting occurred.

MONTH	REGION	n
1	North	12
1	South	45
2	West	34
2	South	23
2	East	32
3	North	11 and so on.

What I am looking to do is to create a separate dataset that extracts the region with the most sightings per month, so the goal is something that looks like this:

MONTH	REGION	n
1	South	45
2	West	34
3	North	11

So far I have tried different combinations of piping such as df %>% group_by(MONTH) %>% max(n), but none have gave the desired result.

score 0 · Answer 1 · answered Feb 18 '23 at 00:36

You could use :

library(dplyr)
# your original data
df <- data.frame(
  MONTH = c(1, 1, 2, 2, 2, 3),
  REGION = c("North", "South", "West", "South", "East", "North"),
  n = c(12, 45, 34, 23, 32, 11)
)


# extract the regions with most sightings 
df_max <- df %>%
    group_by(MONTH) %>%
 slice(which.max(n))

#result
df_max
# A tibble: 3 × 3
# Groups:   MONTH [3]
  MONTH REGION     n
  <dbl> <chr>  <dbl>
1     1 South     45
2     2 West      34
3     3 North     11

How to subset a dataframe based on the max of a variable?

1 Answers1