1

I have a data.frame with this structure:

id time var1 var2 var3
1     2    4    5    6
1     4    8   51    7
1     1    9   17   38
2    12    8    9   21
2    15   25    6   23

For all the ids, I want to have the row that contains the minimum time. In the example in would be this:

id time var1 var2 var3   
1     1    9   17   38
2    12    8    9   21

I think that the aggregate function would be useful, but I'm not sure how to use it.

zx8754
  • 52,746
  • 12
  • 114
  • 209

3 Answers3

1

Your title may be misleading, since you really just want to keep the row with the minimum time for every id. Try this:

library(dplyr)
df %>%
    group_by(id) %>%
    arrange(id, time) %>%
    filter(row_number() == 1)
Nick Becker
  • 4,059
  • 13
  • 19
1

We can use by, do.call, and the ever useful which.min function to get what we need:

do.call('rbind', by(df, df$id, function(x) x[which.min(x$time), ]))

#   id time var1 var2 var3
# 1  1    1    9   17   38
# 2  2   12    8    9   21

And if you suspect there may be more than one minimum value per id, you can eschew the which.min function and use which(x$time == min(x$time)):

do.call('rbind', by(df, df$id, function(x) x[which(x$time == min(x$time)), ]))

#   id time var1 var2 var3
# 1  1    1    9   17   38
# 2  2   12    8    9   21

Data

df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L), 
time = c(2L, 4L, 1L, 2L, 15L), 
var1 = c(4L, 8L, 9L, 8L, 25L), 
var2 = c(5L, 51L, 17L, 9L, 6L), 
var3 = c(6L, 7L, 38L, 21L, 23L)), 
.Names = c("id", "time", "var1", "var2", "var3"), 
class = "data.frame", row.names = c(NA, -5L))
bouncyball
  • 10,631
  • 19
  • 31
1

  • dplyr using the function slice
  • library(dplyr)
    df %>% 
        group_by(id) %>% 
        slice(which.min(time))
    

    Output:

    Source: local data frame [2 x 5]
    Groups: id [2]
    
         id  time  var1  var2  var3
      <dbl> <dbl> <dbl> <dbl> <int>
    1     1     1     9    17    38
    2     2    12     8     9    21
    

  • sqldf
  • library(sqldf)
    sqldf('SELECT id, MIN(time) time, var1, var2, var3
                            FROM df 
                            GROUP BY id')
    

    Output:

      id time var1 var2 var3
    1  1    1    9   17   38
    2  2   12    8    9   21
    
    mpalanco
    • 12,960
    • 2
    • 59
    • 67