Removing duplicated rows from data frame with repect to specific columns and date in R

Question

I want to remove the rows where the columns a b have the same value. Futhermore the unique rows should contain the latest date (column c) of the duplicates. For example:

> a <- c(rep("A", 3), rep("B", 3), rep("C",2))
> b <- c(1,1,2,4,1,1,2,2)
> c <- c("2016-10-01", "2016-10-02", "2016-10-03", "2016-10-04", "2016-10-04", "2016-10-05", "2016-10-06", "2016-10-07")
> df <-data.frame(a,b,c)
> df
  a b          c
1 A 1 2016-10-01
2 A 1 2016-10-02
3 A 2 2016-10-03
4 B 4 2016-10-04
5 B 1 2016-10-04
6 B 1 2016-10-05
7 C 2 2016-10-06
8 C 2 2016-10-07

I want to get the the following dataframe:

  a b          c
1 A 1 2016-10-02
3 A 2 2016-10-03
4 B 4 2016-10-04
5 B 1 2016-10-05
6 C 2 2016-10-07

here is what I tried so far:

> df[!(duplicated(df$a, df$b)| 
+         duplicated(df$a, df$b, fromLast=TRUE)),]
  a b          c
1 A 1 2016-10-01
2 A 1 2016-10-02
3 A 2 2016-10-03
4 B 4 2016-10-04
5 B 1 2016-10-04
6 B 1 2016-10-05

This question doesn't seem to have anything to do with statistics, so you'd be better off asking at Stack Overflow. That said, I think what you're looking for is `df[!duplicated(df[c("a", "b")], fromLast = T), ]` - it assumes your data is already sorted by column `c`, at least within any given `a,b` combination. — Gregor Thomas, Oct 24 '16 at 21:35
Please don't cross post. This will be on topic on [SO], so we will migrate it for you, if you wait. You can also delete this. — gung - Reinstate Monica, Oct 24 '16 at 23:49

score 1 · Answer 1 · answered Oct 25 '16 at 05:48

1

You would first want to sort and then do the selection.

df <- df[ order(df[['c']]), ]
small_df <- df[ !duplicated(df[c('a','b')], fromLast=TRUE)), ]

answered Oct 25 '16 at 05:48

IRTFM

258,963
21
364
487

score 0 · Answer 2 · answered Oct 25 '16 at 06:18

0

You can simply have:

df$c <- as.character(df$c)
aggregate(c~a+b, df, max)

  a b          c
1 A 1 2016-10-02
2 B 1 2016-10-05
3 A 2 2016-10-03
4 C 2 2016-10-07
5 B 4 2016-10-04

answered Oct 25 '16 at 06:18

Sandipan Dey

21,482
2
51
63

Removing duplicated rows from data frame with repect to specific columns and date in R

2 Answers2

Linked