Python Dataframe delete rows which have more than 10 occurrences of a specific value

Question

I have the following data (just a quick picture, numbers continue):

import pandas as pd
df = pd.DataFrame({'Flight Day': ['2018-10-01', '2018-10-01','2018-10-01', '2018-10-01', '2018-10-02','2018-10-02', '2018-10-02', '2018-10-02', '2018-10-03','2018-10-03','2018-10-03','2018-10-03'], 
               'Flight Number': ['CA1336', 'CA1332', 'CA1472', 'CA1473', 'CA1355', 'CA1331', 'CA1666', 'CA1665', 'CA1366', 'CA1337', 'CA1489', 'CA1667'],
               'STD Departure': [10, 15, 8, 11,10, 15, 8, 14,10, 15, 8, 11], 
               'Bircher': [2, 4, 8, 4,3, 2, 3, 1,5, 5, 2, 1],
               'Carac': [2, 4, 8, 4,2, 2, 3, 4,2, 5, 2, 1]})

I only want to keep the last 10 occurrences of the same "Flight Number" and "STD Departure" and delete the rows before the last 10 occurrences. For example, delete the rows for LX1336 (Flight Number) 10 (STD Departure) after before I have the last 10 rows that already qualify, hence have LX1336 (Flight Number) 10 (STD Departure). I have all flight numbers in a CSV document and STD Departure always go from 0 to 23 if that helps. Is there a short and crisp code for this problem? Thank you very much for your help!

Flight Day Flight Number STD Departure Bircher Carac 2018-10-01 00:00:00 LX1336 10 2 2018-10-01 00:00:00 LX1337 15 1 2018-10-01 00:00:00 LX1472 8 1 2018-10-01 00:00:00 LX1473 10 2 2018-10-02 00:00:00 LX1336 12 1 2018-10-02 00:00:00 LX1337 16 1 — MHanu, Dec 05 '18 at 09:21
please provide a [minimal](https://stackoverflow.com/help/mcve) example. also please go through [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — anky, Dec 05 '18 at 09:22

yatu · Accepted Answer · 2018-12-05T10:21:35.607

0

You could use GroupBy.head for that:

df.groupby(['Flight Number','STD Departure']).head(10)

Edit

In the case you want the last 10, just use GroupBy.tail instead as @jondiedoop suggests:

df.groupby(['Flight Number','STD Departure']).tail(10)

edited Dec 05 '18 at 10:21

answered Dec 05 '18 at 09:45

yatu

86,083
12
84
139

Thanks. I just noticed that I actually wanted to keep the last 10. Should I filter or is there a code for .bottom? – MHanu Dec 05 '18 at 10:00
Edited. As @Jondiedoop says just use `.tail()` instead of `.head()` – yatu Dec 05 '18 at 10:23
Please don't forget to accept the answer if it helped you – yatu Dec 05 '18 at 10:24

Python Dataframe delete rows which have more than 10 occurrences of a specific value

1 Answers1

Edit