How to drop rows for each value in a column using a condition?

Question

I have the following dataframe:

df = pd.DataFrame({'No':  [0,0,0,1,1,2,2], 
                   'date':['2020-01-15','2019-12-16','2021-03-01', '2018-05-19', '2016-04-08', '2020-01-02', '2020-03-07']})
df.date =pd.to_datetime(df.date)


    No  date
0   0   2018-01-15
1   0   2019-12-16
2   0   2021-03-01
3   1   2018-05-19
4   1   2016-04-08
5   2   2020-01-02
6   2   2020-03-07

I want to drop the rows if all the date values are earlier than 2020-01-01 for each unique number in No column, i.e. I want to drop rows with the indices 3 and 4.
Is it possible to do it without a for loop?

ignoring_gravity · Accepted Answer · 2021-05-04T12:47:05.790

1

Use groupby and transform:

>>> df[df.groupby('No')['date'].transform('max')>='2020-01-01']
   No       date
0   0 2020-01-15
1   0 2019-12-16
2   0 2021-03-01
5   2 2020-01-02
6   2 2020-03-07

edited May 04 '21 at 12:47

answered May 04 '21 at 12:39

ignoring_gravity

6,677
4
32
65

This is a short and very nice solution. Thank you! – Ani May 04 '21 at 12:57

How to drop rows for each value in a column using a condition?

1 Answers1