0

I have a Dataframe and would like to drop certain rows for each category. Here is the data:

data={'GROUP':['A','A','A','B','B','B','B','C','C','C','C','C'],'DATE':['202101','202102','202103','201907','201908','201909',
'201910','202003','202004','202005','202006','202007']}
df=pd.DataFrame(data, columns=['GROUP','DATE']) 
         
   GROUP    DATE
0      A  202101
1      A  202102
2      A  202103
3      B  201907
4      B  201908
5      B  201909
6      B  201910
7      C  202003
8      C  202004
9      C  202005
10     C  202006
11     C  202007

I would like to drop all the rows after the second date per group. In other words I would like to produce something to this effect:

  GROUP    DATE
0     A  202101
1     A  202102
3     B  201907
4     B  201908
7     C  202003
8     C  202004
mozway
  • 194,879
  • 13
  • 39
  • 75
wild west
  • 15
  • 3

2 Answers2

1

Use GroupBy.head:

df.groupby('GROUP').head(2)

OUTPUT

  GROUP    DATE
0     A  202101
1     A  202102
3     B  201907
4     B  201908
7     C  202003
8     C  202004
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Group the dataframe by GROUP and apply a function to take a slice of two values only.

>>> df.groupby(['GROUP'])['DATE'].apply(lambda x: x[:2]).droplevel(-1).reset_index()

  GROUP    DATE
0     A  202101
1     A  202102
2     B  201907
3     B  201908
4     C  202003
5     C  202004
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45