2

For example, using the airquality data, I want to calculate the maximum temperature for each month. Then keep the day on which this maximum temperature occurred.

library(dplyr)
# Maximum temperature per month
airqualitymax <- airquality %>% 
    group_by(Month) %>% 
    summarise(maxtemp = max(Temp))
# Day of the month on which the max occured
airquality %>% 
    left_join(airqualitymax, by = "Month") %>%
    filter(Temp == maxtemp) 

Now it appears that the day is not unique, but suppose it was unique, Is there a way to select the day on which the maximum occurs in the summarise() directly?

Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110

1 Answers1

2

We can use slice to keep the row that have the maximum 'Temp' for each 'Month'

airquality %>% 
   group_by(Month) %>% 
   slice(which.max(Temp))

A faster option would be to arrange the 'Temp' in descending (or ascending) and get the first observation (or last slice(n()))

airquality %>%
  group_by(Month) %>%
  arrange(desc(Temp)) %>%
  slice(1L)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Great, I accept your answer. I also like the `sumarise(day = Day[which.max(Temp)])` solution provided by @docendo discimus above, because summarise gives a nice short data frame. – Paul Rougieux Jul 28 '16 at 12:26
  • @PaulRougieux I thought you wanted the entire row. – akrun Jul 28 '16 at 12:32
  • both ways are fine. I wasn't clear about this in my question. In the real data, I'm looking at the year in which the maximum consumption occurred for each country. At the moment, I'm exploring the dataset and the more information the better, so I'll use your solution, with the entire row. Then when I implement a function to do this I might use the other data frame, with a shorter row containing only country, max(consumption) and year. – Paul Rougieux Jul 28 '16 at 12:39
  • 1
    @PaulRougieux You can use the `%>% select(columnsofinterest)` to the above solution to subset the columns – akrun Jul 28 '16 at 12:49