0

I have the following dataframe:

df = pd.DataFrame({'No':  [0,0,0,1,1,2,2], 
                   'date':['2020-01-15','2019-12-16','2021-03-01', '2018-05-19', '2016-04-08', '2020-01-02', '2020-03-07']})
df.date =pd.to_datetime(df.date)


    No  date
0   0   2018-01-15
1   0   2019-12-16
2   0   2021-03-01
3   1   2018-05-19
4   1   2016-04-08
5   2   2020-01-02
6   2   2020-03-07

I want to drop the rows if all the date values are earlier than 2020-01-01 for each unique number in No column, i.e. I want to drop rows with the indices 3 and 4.
Is it possible to do it without a for loop?

Ani
  • 159
  • 7

1 Answers1

1

Use groupby and transform:

>>> df[df.groupby('No')['date'].transform('max')>='2020-01-01']
   No       date
0   0 2020-01-15
1   0 2019-12-16
2   0 2021-03-01
5   2 2020-01-02
6   2 2020-03-07
ignoring_gravity
  • 6,677
  • 4
  • 32
  • 65