Select a value for based on a highest value in another column

Question

I don't understand why I can't find a solution for this, since I feel that this is a pretty basic question. Need to ask for help, then. I want to rearrange airquality dataset by month with maximum temp value for each month. In addition I want to find the corresponding day for each monthly maximum temperature. What is the laziest (code-wise) way to do this?

I have tried following without a success:

require(reshape2)
names(airquality) <- tolower(names(airquality))
mm <- melt(airquality, id.vars = c("month", "day"), meas = c("temp"))

dcast(mm, month + day ~ variable, max)
aggregate(formula = temp ~ month + day, data = airquality, FUN = max)

I am after something like this:

month day temp
5     7    89
...

score 8 · Accepted Answer · answered May 22 '12 at 15:56

There was quite a discussion a while back about whether being lazy is good or not. Anwyay, this is short and natural to write and read (and is fast for large data so you don't need to change or optimize it later) :

require(data.table)
DT=as.data.table(airquality)

DT[,.SD[which.max(Temp)],by=Month]

     Month Ozone Solar.R Wind Temp Day
[1,]     5    45     252 14.9   81  29
[2,]     6    NA     259 10.9   93  11
[3,]     7    97     267  6.3   92   8
[4,]     8    76     203  9.7   97  28
[5,]     9    73     183  2.8   93   3

.SD is the subset of the data for each group, and you just want the row from it with the largest Temp, iiuc. If you need the row number then that can be added.

Or to get all the rows where the max is tied :

DT[,.SD[Temp==max(Temp)],by=Month]

     Month Ozone Solar.R Wind Temp Day
[1,]     5    45     252 14.9   81  29
[2,]     6    NA     259 10.9   93  11
[3,]     7    97     267  6.3   92   8
[4,]     7    97     272  5.7   92   9
[5,]     8    76     203  9.7   97  28
[6,]     9    73     183  2.8   93   3
[7,]     9    91     189  4.6   93   4

Thanks guys! I haven't used data.table package previously, so it's about a time. "Answer accepted" because this one was the most complete (and slightly shorter than Chritoph_J's). All of you would have deserved it, though (+1 for everyone). — Mikko, May 22 '12 at 16:17
I wonder if there is a more concise solution in any language. I'm guessing not. — Ben Rollert, Apr 15 '14 at 14:52

score 5 · Answer 2 · answered May 23 '12 at 02:03

Another approach with plyr

require(reshape2)
names(airquality) <- tolower(names(airquality))
mm <- melt(airquality, id.vars = c("month", "day"), meas = c("temp"), value.name = 'temp')

library(plyr)

ddply(mm, .(month), subset, subset = temp == max(temp), select = -variable)

Gives

  month day temp
1     5  29   81
2     6  11   93
3     7   8   92
4     7   9   92
5     8  28   97
6     9   3   93
7     9   4   93

Or, even simpler

require(reshape2)
require(plyr)
names(airquality) <- tolower(names(airquality))
ddply(airquality, .(month), subset, 
  subset = temp == max(temp), select = c(month, day, temp) )

score 2 · Answer 3 · answered May 22 '12 at 15:44

how about with plyr?

max.func <- function(df) {
   max.temp <- max(df$temp)

   return(data.frame(day = df$Day[df$Temp==max.temp],
                     temp = max.temp))
}

ddply(airquality, .(Month), max.func)

As you can see, the max temperature for the month happens on more than one day. If you want different behavior, the function is easy enough to adjust.

score 2 · Answer 4 · edited May 23 '17 at 11:46

2

Or if you want to use the data.table package (for instance, if speed is an issue and the data set is large or if you prefer the syntax):

library(data.table)
DT <- data.table(airquality)
DT[, list(maxTemp=max(Temp), dayMaxTemp=.SD[max(Temp)==Temp, Day]), by="Month"]

If you want to know what the .SD stands for, have a look here: SO

edited May 23 '17 at 11:46

Community

1
1

answered May 22 '12 at 15:55

Christoph_J

6,804
8
44
58

I think you beat me on that one :) – Matt Dowle May 22 '12 at 15:59
Time-wise, yes, by two seconds (time to celebrate ;-) But your `which.max` trick is neat...so +1 for that! – Christoph_J May 22 '12 at 16:01

Select a value for based on a highest value in another column

4 Answers4

Linked