In reality I have a very large data frame. One column contains an ID and another contains a value associated with that ID. However, each ID occurs multiple times with differing values, and I wish to record the maximum value for each ID while discarding the rest. Here is a replicable example using the quakes
dataset in R
:
data <- as.data.frame(quakes)
##Create output matrix
output <- matrix(,length(unique(data[,5])),2)
colnames(output) <- c("Station ID", "Max Mag")
##Grab unique station IDs
uni <- unique(data[,5])
##Go through each station ID and record the maximum magnitude
for (i in 1:dim(output)[1])
{
sub.data <- data[which(data[,5]==uni[i]),]
##Put station ID in column 1
output[i,1] <- uni[i]
##Put biggest magnitude in column 2
output[i,2] <- max(sub.data[,4])
}
Considering that with my real data I have data frames with dimensions of 100000's of rows, this is a slow process. Is there a quicker way to execute such a task?
Any help much appreciated!