0

Example data:

year <- c(1990, 1991)
January <- c(1, 1)
February <- c(0, 3)

df <- data.frame(year, January, February)
  year January February
1 1990       1        0
2 1991       1        3

I want to get a new data frame with the maximum temperature and the month of the maximum temperature, so, this:

max_temp <- c(1,3)
month <- c("January", "February")

new_df <- data.frame(year, month, max_temp)
  year    month max_temp
1 1990  January        1
2 1991 February        3

Only I have data for 400 years and each year has 1100 months, so it's important that this runs reasonably quickly.

I've melted the original data frame and grouped the data by year:

melted <- melt(df, id.vars = "year")
new_frame <- melted %>%
  group_by(year) %>%
  summarize(max_temp = max(value))

But I haven't figured out how to get the month. Is there an efficient way to do this in the R idiom?

Frank
  • 66,179
  • 8
  • 96
  • 180
MZimbric
  • 45
  • 7
  • If there are ties, you want to keep both? Btw, if you really care about speed, see http://stackoverflow.com/questions/31852294/how-to-speed-up-subset-by-groups/31854111#31854111 – Frank May 01 '17 at 14:40
  • 1
    I want to keep both if there are ties. – MZimbric May 01 '17 at 14:49

1 Answers1

0
library(tidyverse)

df_new <- df %>% gather(month, temp, January:February) %>%
      group_by(year) %>% filter(temp == max(temp))
udden2903
  • 783
  • 6
  • 15
  • It's substantially faster to do `df %>% gather(month, temp, January:February) %>% group_by(year) %>% arrange(desc(temp)) %>% slice(1)` for medium to large data frames (though this will return only one row per group, even if there are ties). – eipi10 May 01 '17 at 14:44
  • @epi `arrange(desc(temp)) %>% distinct(year, .keep_all = TRUE)` maybe. Oh, nvm, OP wants to keep ties, so neither of these work. – Frank May 01 '17 at 14:49