I have a dataframe that looks like the this:
I want to keep only the consecutive years in each group, such as the following figure where the year of 2005 in group A and year of 2009 and 2011 in group B are deleted.
I created a column of the year difference by using df['year_diff']=df.groupby(['group'])['Year'].diff()
, and then only kept the rows where the year difference was equal to 1.
However, this method will also delete the first row in each consecutive year group since the year difference of the first row will be NAN. For example, the year of 2000 will be deleted from group 2000-2005. Is there a way that I can do to avoid this problem?