0

I want to remove all rows where columns a and b have the same values. Furthermore should column c contain the latest date if a and b are the same. I was thinking about to sort the dataframe with respect to column c and then removing all duplicates (a and c). It is my understanding that the function "duplicated" process in a specific order.

For example:

a <- c(rep("A", 3), rep("B", 3), rep("C",2))
    > b <- c(1,1,2,4,1,1,2,2)
    > c <- c("2016-10-01", "2016-10-02", "2016-10-03", "2016-10-04", "2016-10-04", "2016-10-05", "2016-10-06", "2016-10-07")
    > df <-data.frame(a,b,c)
    > df
      a b          c
    1 A 1 2016-10-01
    2 A 1 2016-10-02
    3 A 2 2016-10-03
    4 B 4 2016-10-04
    5 B 1 2016-10-04
    6 B 1 2016-10-05
    7 C 2 2016-10-06
    8 C 2 2016-10-07

I want to get the following dataframe as a result:

      a b          c
    1 A 1 2016-10-02
    2 A 2 2016-10-03
    3 B 4 2016-10-04
    4 B 1 2016-10-05
    5 C 2 2016-10-07

1 Answers1

0

Yes, duplicated processes in a specific order. To start from the bottom, use fromLast=TRUE.

> df[!duplicated( df[,1:2], fromLast=TRUE ), ]
  a b          c
2 A 1 2016-10-02
3 A 2 2016-10-03
4 B 4 2016-10-04
6 B 1 2016-10-05
8 C 2 2016-10-07
Elvis
  • 548
  • 2
  • 14