I want to remove all rows where columns a and b have the same values. Furthermore should column c contain the latest date if a and b are the same. I was thinking about to sort the dataframe with respect to column c and then removing all duplicates (a and c). It is my understanding that the function "duplicated" process in a specific order.
For example:
a <- c(rep("A", 3), rep("B", 3), rep("C",2))
> b <- c(1,1,2,4,1,1,2,2)
> c <- c("2016-10-01", "2016-10-02", "2016-10-03", "2016-10-04", "2016-10-04", "2016-10-05", "2016-10-06", "2016-10-07")
> df <-data.frame(a,b,c)
> df
a b c
1 A 1 2016-10-01
2 A 1 2016-10-02
3 A 2 2016-10-03
4 B 4 2016-10-04
5 B 1 2016-10-04
6 B 1 2016-10-05
7 C 2 2016-10-06
8 C 2 2016-10-07
I want to get the following dataframe as a result:
a b c
1 A 1 2016-10-02
2 A 2 2016-10-03
3 B 4 2016-10-04
4 B 1 2016-10-05
5 C 2 2016-10-07