I have a dataset like this:
df <- structure(list(group = c("1", "2", "3", "4", "5", "6", "7", "8"),
id = c("1", "1", "2","2", "3", "3","3", "3"),
year = c("2015", "2016","2015","2016","2015","2016","2017","2017"),
value =c("23","24","72","78","33","38","28","29")),
.Names = c("group", "id", "year", "value"), class = "data.frame",row.names = c(NA, -8L))
As you can see, there are two rows for id=3 with the year 2017 (groups 7 and 8). How could I remove the second row with the year equal to 2017? Even though the value in group 8 is different from that in group 7, I still do not want to keep it.
This data is just an example. The original data is much larger than this one. It has lots of ids that have two rows in their maximum year. I need to remove the second row.
Does anyone know how to do it? Thank you so much in advance.
Best, Olivia
The result should be like this:
df2 <- structure(list(group = c("1", "2", "3", "4", "5", "6", "7"),
id = c("1", "1", "2","2", "3", "3","3"),
year = c("2015", "2016","2015","2016","2015","2016","2017"),
value =c("23","24","72","78","33","38","28")),
.Names = c("group", "id", "year", "value"), class = "data.frame",row.names = c(NA, -7L))