I am working with a dataset in R and would like to keep the ID numbers where there is more than 1 year of data available.
With the picture as a reference, I would like to keep the rows where the ID number is 1 or 2 (since they have more than 1 year of observed data), but remove those with ID number 3 (since data was only observed in one year).
How can I do this easily in R? My thought was to loop across the row numbers and create a dummy variable where the condition I need is met. I was thinking of having a 1 where the difference in the ID number is 0 and the difference in the year is not 0. This would allow me to identify the ID numbers I need to keep.
for(i in 1:nrow(agg_data_condensed)){
agg_data_condensed = mutate(dum = case_when((agg_data_condensed[i,1]-agg_data_condensed[i-1,1] == 0) & (agg_data_condensed[i,7]-agg_data_condensed[i-1,7] != 0) ~ 1 ))
}
However, this is not giving me what I want. It is actually giving me the error "Error in UseMethod("mutate") : no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')".
Any help would be greatly appreciated!
Edit: here is the output from the dput function
structure(list(ID = c(1, 1, 1, 2, 2, 2, 3), Year = c(2005, 2006,
2007, 2005, 2006, 2006, 2008)), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))