I have a dataframe df
like this:
ID CreationDate(d.m.Y) ...
x 12.10.2015 ...
y 09.05.2015 ...
x 18.10.2015 ...
... ... ...
I am aware of the duplicated()
and unique()
functions, so I could find out which IDs are duplicated by calling df$ID[duplicated(df$ID)]
. I could then easily remove those datasets from the table - but what I want to do is keep only one of them from the data table: the one that was created last (has a "bigger" CreationDate).
In the example given I would like to have the first row removed because its CreationDate is earlier than the third row's. In this case it is also the first occurence, but that is not certain in the real data.
I am lost and do not have an idea how to solve this. I would be really grateful for help or any advice! Thanks in advance!