0

I have a dataframe that looks like this:

enter image description here

There are several groups in the dataframe, and each group contains more than two consecutive years. For example, in this case, group A contains 2000-2005 and 2010-2015. What I want to do is to keep only the consecutive years that contains the target year. For group A these values will be 2010-2015. Group B will be deleted since none of the consecutive year groups contain the target year 2006. The result should look like follows

enter image description here

I have been thinking about the problem for a few days but didn't have any clue. Any help would be greatly appreciated and thanks a lot for your guidance.

Sincerely

Wilson

yihao ren
  • 369
  • 1
  • 4
  • 15

1 Answers1

1

So just like how we solved your previous problem , here just need adding another conditions created by any and transform

s1=df.g.groupby([df.g,df.Year.diff().ne(1).cumsum()]).transform('count').ne(1)
s2=df.Year.eq(df['Target Year']).groupby([df.g,df.Year.diff().ne(1).cumsum()]).transform('any')
df[s1&s2]
BENY
  • 317,841
  • 20
  • 164
  • 234