I want to select rows/groups that with year 2000 and 2001 and need to have both year 2000 and 2001 in a data set like below:
ID,year,age
810006862,2000,49
810006862,2001,
810006862,2002,
810006862,2003,52
810023112,2000,27
810023112,2004,28
810023112,2005,29
810023112,2006,30
810033622,2000,24
810033622,2001,25
I've tried the following codes, but both came back with empty data frame.
df1411 = df.groupby('ID').filter(lambda x: set(x['year']) == {'2000', '2001'})
df[df.groupby('ID')['year'].transform(lambda x: set(x.values.tolist()) == {'2000','2001'})]
These two codes below pick up one extra group that has only year 2000. I'd like to have group with both year 2000 and 2001.
df[df['year'].isin({2000, 2001})]
df.loc[df.year.isin(['2000', '2001'])]
The result I want is somewhat like this:
ID,year,age
810006862,2000,49
810006862,2001,
810033622,2000,24
810033622,2001,25