Aggregate by one variable but adding other variables

Question

I have a data.frame with this structure:

id time var1 var2 var3
1     2    4    5    6
1     4    8   51    7
1     1    9   17   38
2    12    8    9   21
2    15   25    6   23

For all the ids, I want to have the row that contains the minimum time. In the example in would be this:

id time var1 var2 var3   
1     1    9   17   38
2    12    8    9   21

I think that the aggregate function would be useful, but I'm not sure how to use it.

Do you have a tie-breaker? Or is one minimum value guaranteed? — Pierre L, Aug 24 '16 at 20:30

score 1 · Answer 1 · answered Aug 24 '16 at 20:04

1

Your title may be misleading, since you really just want to keep the row with the minimum time for every id. Try this:

library(dplyr)
df %>%
    group_by(id) %>%
    arrange(id, time) %>%
    filter(row_number() == 1)

answered Aug 24 '16 at 20:04

Nick Becker

4,059
13
19

bouncyball · Answer 2 · 2016-08-24T20:34:42.100

We can use by, do.call, and the ever useful which.min function to get what we need:

do.call('rbind', by(df, df$id, function(x) x[which.min(x$time), ]))

#   id time var1 var2 var3
# 1  1    1    9   17   38
# 2  2   12    8    9   21

And if you suspect there may be more than one minimum value per id, you can eschew the which.min function and use which(x$time == min(x$time)):

do.call('rbind', by(df, df$id, function(x) x[which(x$time == min(x$time)), ]))

#   id time var1 var2 var3
# 1  1    1    9   17   38
# 2  2   12    8    9   21

Data

df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L), 
time = c(2L, 4L, 1L, 2L, 15L), 
var1 = c(4L, 8L, 9L, 8L, 25L), 
var2 = c(5L, 51L, 17L, 9L, 6L), 
var3 = c(6L, 7L, 38L, 21L, 23L)), 
.Names = c("id", "time", "var1", "var2", "var3"), 
class = "data.frame", row.names = c(NA, -5L))

mpalanco · Answer 3 · 2016-08-25T06:45:09.290

dplyr using the function slice

library(dplyr)
df %>% 
    group_by(id) %>% 
    slice(which.min(time))

Output:

Source: local data frame [2 x 5]
Groups: id [2]

     id  time  var1  var2  var3
  <dbl> <dbl> <dbl> <dbl> <int>
1     1     1     9    17    38
2     2    12     8     9    21

sqldf

library(sqldf)
sqldf('SELECT id, MIN(time) time, var1, var2, var3
                        FROM df 
                        GROUP BY id')

Output:

  id time var1 var2 var3
1  1    1    9   17   38
2  2   12    8    9   21

Aggregate by one variable but adding other variables

3 Answers3

Data