Find rows in dataframe with maximum values grouped by values in another column

Question

I would like to solve this problem in R without using SQL.

How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?

Sure, I could use sqldf to do it, but there must be a cool apply method in R to do it, too?

I don't know what "distinct" does, but `x[which.max(x$var),]` does the other part. — Frank, May 18 '13 at 18:28

G. Grothendieck · Accepted Answer · 2014-10-16T12:25:53.823

Setup data First read in the data:

Lines <- "id  home  datetime  player   resource
1   10   04/03/2009  john    399 
2   11   04/03/2009  juliet  244
5   12   04/03/2009  borat   555
3   10   03/03/2009  john    300
4   11   03/03/2009  juliet  200
6   12   03/03/2009  borat   500
7   13   24/12/2008  borat   600
8   13   01/01/2009  borat   700
"
DF <- read.table(text = Lines, header = TRUE)
DF$datetime <- as.Date(DF$datetime, format = "%d/%m/%Y")

1) base - by There are many ways to process this using various packages but here we will show a base solution first:

> do.call("rbind", by(DF, DF$home, function(x) x[which.max(x$datetime), ]))
   id home   datetime player resource
10  1   10 2009-03-04   john      399
11  2   11 2009-03-04 juliet      244
12  5   12 2009-03-04  borat      555
13  8   13 2009-01-01  borat      700

1a) base - ave and a variation (also only using the base of R):

FUN <- function(x) which.max(x) == seq_along(x)
is.max <- ave(xtfrm(DF$datetime), DF$home, FUN = FUN) == 1
DF[is.max, ]

2) sqldf and here it is using sqldf just in case:

> library(sqldf)
> sqldf("select id, home, max(datetime) datetime, player, resource 
+        from DF 
+        group by home")
  id home   datetime player resource
1  1   10 2009-03-04   john      399
2  2   11 2009-03-04 juliet      244
3  5   12 2009-03-04  borat      555
4  8   13 2009-01-01  borat      700

score 1 · Answer 2 · edited Jan 18 '17 at 08:00

I do not use SQL as well, so I would do it in this way.

1)

df <- read.table("your file", "your options") # I leave this to you

2)

row_with_max_value <- max(which(df$values & df$group_column=="desired_group"))

"row_with_max_value" contents the row number of your data frame (df), in which you find the maximum value of the column "values" (df$values) grouped by "group_column". If "group_column" is not of type character, remove the quotes and use the corresponding text format.

If you need the value, than

df$values[row_with_max_value]

Probably it is not the most elegant way, but you do not need SQL and it works (at least for me ;)

Find rows in dataframe with maximum values grouped by values in another column

2 Answers2

Linked