How to take all records with max value for each group

Question

I have a data table as below:

   user                time follow_group
1:    1 2017-09-01 00:01:01            1
2:    1 2017-09-01 00:01:20            1
3:    1 2017-09-01 00:03:01            1
4:    1 2017-09-01 00:10:01            2
5:    1 2017-09-01 00:11:01            2
6:    2 2017-09-01 00:01:03            1
7:    2 2017-09-01 00:01:08            1
8:    2 2017-09-01 00:03:01            1

From this I want to take all the records with highest follow_group for each user

So what I did was

data[max(follow_group), , by = list(user)]

But this returned me an error

Error in `[.data.table`(data, max(follow_group),  : 
  'by' or 'keyby' is supplied but not j

Any help is appreciated.Thanks.

Seems related to [How to select the row with the maximum value in each group](https://stackoverflow.com/questions/24558328/how-to-select-the-row-with-the-maximum-value-in-each-group) or [Subset by group with data.table](https://stackoverflow.com/questions/16573995/subset-by-group-with-data-table) — Henrik, Oct 18 '17 at 14:28

acylam · Accepted Answer · 2017-10-18T15:19:13.427

You can do this with data.table:

library(data.table)
setDT(df)[, .SD[follow_group == max(follow_group)], by = user]

or this with dplyr:

library(dplyr)
df %>%
  group_by(user) %>%
  filter(follow_group == max(follow_group))

Result:

   user                time follow_group
1:    1 2017-09-01 00:10:01            2
2:    1 2017-09-01 00:11:01            2
3:    2 2017-09-01 00:01:03            1
4:    2 2017-09-01 00:01:08            1
5:    2 2017-09-01 00:03:01            1

# A tibble: 5 x 3
# Groups:   user [2]
   user                time follow_group
  <int>               <chr>        <int>
1     1 2017-09-01 00:10:01            2
2     1 2017-09-01 00:11:01            2
3     2 2017-09-01 00:01:03            1
4     2 2017-09-01 00:01:08            1
5     2 2017-09-01 00:03:01            1

Data:

df = structure(list(user = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), time = c("2017-09-01 00:01:01", 
"2017-09-01 00:01:20", "2017-09-01 00:03:01", "2017-09-01 00:10:01", 
"2017-09-01 00:11:01", "2017-09-01 00:01:03", "2017-09-01 00:01:08", 
"2017-09-01 00:03:01"), follow_group = c(1L, 1L, 1L, 2L, 2L, 
1L, 1L, 1L)), class = "data.frame", .Names = c("user", "time", 
"follow_group"), row.names = c(NA, -8L))

@user7648269 This gives maximum for each `user`, not _all rows_ that are maximums for each `user` — acylam, Oct 18 '17 at 15:18
Yes, then take the maximum for each user and join back to the main table, per Arun's answer: https://stackoverflow.com/a/31854111 — Frank, Oct 18 '17 at 17:52

How to take all records with max value for each group

1 Answers1