Remove certain duplicate row - keep only the entry with the latest creation date

Question

I have a dataframe df like this:

ID     CreationDate(d.m.Y)    ...
x      12.10.2015             ...
y      09.05.2015             ...
x      18.10.2015             ...
...    ...                    ...

I am aware of the duplicated() and unique() functions, so I could find out which IDs are duplicated by calling df$ID[duplicated(df$ID)]. I could then easily remove those datasets from the table - but what I want to do is keep only one of them from the data table: the one that was created last (has a "bigger" CreationDate).

In the example given I would like to have the first row removed because its CreationDate is earlier than the third row's. In this case it is also the first occurence, but that is not certain in the real data.

I am lost and do not have an idea how to solve this. I would be really grateful for help or any advice! Thanks in advance!

Try `unique(setDT(df)[order(-as.Date(CreationDate, '%d.%m.%Y'))], by ='ID')` — akrun, Dec 11 '15 at 19:17
So, if there are 5 duplicates, you only want to remove the last one, keeping the other 4? — Gregor Thomas, Dec 11 '15 at 19:17
If there were 5 duplicates I would only want to keep the entry with the latest CreationDate. — tristndev, Dec 11 '15 at 19:18
Because keeping the first record only is exactly what `df[!duplicated(df$ID), ]` does, so if you can first sort by creation date then you probably just want the default `duplicated` behavior. — Gregor Thomas, Dec 11 '15 at 19:20
Great - I didn't have that idea. I was already looking into applying strange formations of ```sapply```... Thank you!! Feel free to post your comments as answers. — tristndev, Dec 11 '15 at 19:21
I don't see how my question was asked before in that other thread.. — tristndev, Dec 11 '15 at 19:53
I think it is a duplicate although not the one that was cited: http://stackoverflow.com/questions/15641924/remove-all-duplicates-except-last-instance/15643072#15643072 (which seems sufficiently congruent if you just omit the `!`) — IRTFM, Dec 11 '15 at 20:46
Yes, 42's suggestion is a better match. I don't know of a way to switch the one marked as duplicate other than re-open then re-close. — Gregor Thomas, Dec 11 '15 at 23:44

Remove certain duplicate row - keep only the entry with the latest creation date

0 Answers0