Remove duplicated row in column and keep last row in R

Question

Hello I have a df such as

COL1 COL2 COL3           COL4
NA   NA   Sp_canis_lupus 10
3    8    Sp_canis_lupus 10
3    8    Sp_canis_lupus 10

How can I remove duplicate rows in COL3 and keep the last row ?

Here I should get :

COL1 COL2 COL3           COL4
3    8    Sp_canis_lupus 10

Thank you very much for your help

score 4 · Accepted Answer · answered Aug 15 '20 at 10:33

You could also solve this with aggregate, like below:

aggregate(. ~ COL3, data = df, FUN = tail, 1)

Or another way in dplyr:

library(dplyr)

df %>%
  group_by(COL3) %>%
  slice(n())

This of course assumes that you're only after duplicates in COL3 - otherwise you'll need to rephrase the problem (as the example doesn't seem to be particularly complex).

Carlos S Traynor · Answer 2 · 2020-08-15T10:28:37.120

2

Using dplyr:

df %>% 
 group_by(COL3) %>%
 filter(row_numer() == n() )

Upvote if it helps thanks!

edited Aug 15 '20 at 10:28

answered Aug 15 '20 at 10:24

Carlos S Traynor

131
5

Where is the information about the fact that I want to keep only the last row? – chippycentra Aug 15 '20 at 10:26
in ```row_number() == n()``` you keep the last row of each level of COL3. Is that what you were looking for is it not? – Carlos S Traynor Aug 15 '20 at 10:29
But I get the message ```x 'row_number' object not found``` – chippycentra Aug 15 '20 at 10:30
sorry I missed the ```()``` I updated the comment now – Carlos S Traynor Aug 15 '20 at 10:30
I get the same error message ... – chippycentra Aug 15 '20 at 10:33
This will work better than Roman answer. Roman answer needs to add ```x[!duplicated(x$COL3), ]``` I think. – Carlos S Traynor Aug 15 '20 at 10:35
what version of dplyr are you using? I have dplyr_0.8.3 – Carlos S Traynor Aug 15 '20 at 10:38
maybe ```dplyr::row_number()``` – Carlos S Traynor Aug 15 '20 at 10:39
Somehow, this solution of using `filter(row_number()==n())` is faster with my 3m-row dataframe than the accepted solution of using `slice(n())` above. Both work fine though. – Faustin Gashakamba Oct 06 '22 at 07:25

score 1 · Answer 3 · answered Aug 15 '20 at 10:23

1

Use duplicated to find duplicates - and then select those that are not duplicated, i.e. x[!duplicated(x), ]. You may need to make the statement a bit more elaborate given that you have NAs in there.

answered Aug 15 '20 at 10:23

Roman Luštrik

69,533
24
154
197

Remove duplicated row in column and keep last row in R

3 Answers3