How to remove the second duplicate

Question

I have a dataset like this:

df <-  structure(list(group = c("1", "2", "3", "4", "5", "6", "7", "8"), 
                      id = c("1", "1", "2","2", "3", "3","3", "3"), 
                      year = c("2015", "2016","2015","2016","2015","2016","2017","2017"),
                      value =c("23","24","72","78","33","38","28","29")),
                 .Names = c("group", "id", "year", "value"), class = "data.frame",row.names = c(NA, -8L))

As you can see, there are two rows for id=3 with the year 2017 (groups 7 and 8). How could I remove the second row with the year equal to 2017? Even though the value in group 8 is different from that in group 7, I still do not want to keep it.

This data is just an example. The original data is much larger than this one. It has lots of ids that have two rows in their maximum year. I need to remove the second row.

Does anyone know how to do it? Thank you so much in advance.

Best, Olivia

The result should be like this:

df2 <-  structure(list(group = c("1", "2", "3", "4", "5", "6", "7"), 
                      id = c("1", "1", "2","2", "3", "3","3"), 
                      year = c("2015", "2016","2015","2016","2015","2016","2017"),
                      value =c("23","24","72","78","33","38","28")),
                 .Names = c("group", "id", "year", "value"), class = "data.frame",row.names = c(NA, -7L))

Thank you @Ritchie, it works! And your code could remove additional duplicates and just keep this first one. Thank you so much! — Olivia Wang, Feb 03 '23 at 00:26

How to remove the second duplicate

0 Answers0