I have a data frame that is rather large and I need a good way (explained bellow) to extract indices for rows that have maximum values for a given field, within a certain set of labels. To explain this a bit better, here is an example 10 row data frame:
value label
1 5.531637 D
2 5.826498 A
3 8.866210 A
4 1.387978 C
5 8.128505 C
6 7.391311 B
7 1.829392 A
8 4.373273 D
9 7.380244 A
10 6.157304 D
To generate:
structure(list(value = c(5.531637, 5.826498, 8.86621, 1.387978, 8.128505,
7.391311, 1.829392, 4.373273, 7.380244, 6.157304),
label = c("D", "A", "A", "C", "C", "B", "A", "D", "A", "D")),
.Names = c("value", "label"), class = "data.frame", row.names = c(NA, -10L))
If I want to know what the index is for rows that have the maximum value per label, I currently use the following code:
idx <- sapply(split(1:nrow(d), d$label), function(x) {
x[which.max(d[x,"value"])]
})
Generating this answer:
A B C D
3 6 5 10
I have also played around with ddply
but have yet to find a better way to do this. By "better" in this case I mean faster (ddply
is pretty slow and what I currently use is not far behind) as well as more elegant since the above solution seems way to wordy too me.