-1

Hello I have a df such as

COL1 COL2 COL3           COL4
NA   NA   Sp_canis_lupus 10
3    8    Sp_canis_lupus 10
3    8    Sp_canis_lupus 10 

How can I remove duplicate rows in COL3 and keep the last row ?

Here I should get :

COL1 COL2 COL3           COL4
3    8    Sp_canis_lupus 10 

Thank you very much for your help

Ruben Helsloot
  • 12,582
  • 6
  • 26
  • 49
chippycentra
  • 3,396
  • 1
  • 6
  • 24

3 Answers3

4

You could also solve this with aggregate, like below:

aggregate(. ~ COL3, data = df, FUN = tail, 1)

Or another way in dplyr:

library(dplyr)

df %>%
  group_by(COL3) %>%
  slice(n())

This of course assumes that you're only after duplicates in COL3 - otherwise you'll need to rephrase the problem (as the example doesn't seem to be particularly complex).

arg0naut91
  • 14,574
  • 2
  • 17
  • 38
2

Using dplyr:

df %>% 
 group_by(COL3) %>%
 filter(row_numer() == n() )

Upvote if it helps thanks!

1

Use duplicated to find duplicates - and then select those that are not duplicated, i.e. x[!duplicated(x), ]. You may need to make the statement a bit more elaborate given that you have NAs in there.

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197