Using dplyr group_by summarise how to keep a variable occuring at the maximum of another variable?

Question

For example, using the airquality data, I want to calculate the maximum temperature for each month. Then keep the day on which this maximum temperature occurred.

library(dplyr)
# Maximum temperature per month
airqualitymax <- airquality %>% 
    group_by(Month) %>% 
    summarise(maxtemp = max(Temp))
# Day of the month on which the max occured
airquality %>% 
    left_join(airqualitymax, by = "Month") %>%
    filter(Temp == maxtemp)

Now it appears that the day is not unique, but suppose it was unique, Is there a way to select the day on which the maximum occurs in the summarise() directly?

Do you mean `... summarise(maxtemp = max(Temp), day = Day[which.max(Temp)])`? — talat, Jul 28 '16 at 11:57

akrun · Accepted Answer · 2016-07-28T12:14:39.583

2

We can use slice to keep the row that have the maximum 'Temp' for each 'Month'

airquality %>% 
   group_by(Month) %>% 
   slice(which.max(Temp))

A faster option would be to arrange the 'Temp' in descending (or ascending) and get the first observation (or last slice(n()))

airquality %>%
  group_by(Month) %>%
  arrange(desc(Temp)) %>%
  slice(1L)

edited Jul 28 '16 at 12:14

answered Jul 28 '16 at 12:05

akrun

874,273
37
540
662

1

Great, I accept your answer. I also like the `sumarise(day = Day[which.max(Temp)])` solution provided by @docendo discimus above, because summarise gives a nice short data frame. – Paul Rougieux Jul 28 '16 at 12:26
@PaulRougieux I thought you wanted the entire row. – akrun Jul 28 '16 at 12:32
both ways are fine. I wasn't clear about this in my question. In the real data, I'm looking at the year in which the maximum consumption occurred for each country. At the moment, I'm exploring the dataset and the more information the better, so I'll use your solution, with the entire row. Then when I implement a function to do this I might use the other data frame, with a shorter row containing only country, max(consumption) and year. – Paul Rougieux Jul 28 '16 at 12:39
1

@PaulRougieux You can use the `%>% select(columnsofinterest)` to the above solution to subset the columns – akrun Jul 28 '16 at 12:49

Using dplyr group_by summarise how to keep a variable occuring at the maximum of another variable?

1 Answers1