0

I have the following data (just a quick picture, numbers continue):

import pandas as pd
df = pd.DataFrame({'Flight Day': ['2018-10-01', '2018-10-01','2018-10-01', '2018-10-01', '2018-10-02','2018-10-02', '2018-10-02', '2018-10-02', '2018-10-03','2018-10-03','2018-10-03','2018-10-03'], 
               'Flight Number': ['CA1336', 'CA1332', 'CA1472', 'CA1473', 'CA1355', 'CA1331', 'CA1666', 'CA1665', 'CA1366', 'CA1337', 'CA1489', 'CA1667'],
               'STD Departure': [10, 15, 8, 11,10, 15, 8, 14,10, 15, 8, 11], 
               'Bircher': [2, 4, 8, 4,3, 2, 3, 1,5, 5, 2, 1],
               'Carac': [2, 4, 8, 4,2, 2, 3, 4,2, 5, 2, 1]})

I only want to keep the last 10 occurrences of the same "Flight Number" and "STD Departure" and delete the rows before the last 10 occurrences. For example, delete the rows for LX1336 (Flight Number) 10 (STD Departure) after before I have the last 10 rows that already qualify, hence have LX1336 (Flight Number) 10 (STD Departure). I have all flight numbers in a CSV document and STD Departure always go from 0 to 23 if that helps. Is there a short and crisp code for this problem? Thank you very much for your help!

MHanu
  • 335
  • 1
  • 9
  • 2
    provide text code, not image – meW Dec 05 '18 at 09:19
  • Flight Day Flight Number STD Departure Bircher Carac 2018-10-01 00:00:00 LX1336 10 2 2018-10-01 00:00:00 LX1337 15 1 2018-10-01 00:00:00 LX1472 8 1 2018-10-01 00:00:00 LX1473 10 2 2018-10-02 00:00:00 LX1336 12 1 2018-10-02 00:00:00 LX1337 16 1 – MHanu Dec 05 '18 at 09:21
  • 2
    please provide a [minimal](https://stackoverflow.com/help/mcve) example. also please go through [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – anky Dec 05 '18 at 09:22
  • 1
    Sorry, I am clearly new here :) – MHanu Dec 05 '18 at 09:30
  • I just added a minimal example – MHanu Dec 05 '18 at 09:31

1 Answers1

0

You could use GroupBy.head for that:

df.groupby(['Flight Number','STD Departure']).head(10)

Edit

In the case you want the last 10, just use GroupBy.tail instead as @jondiedoop suggests:

df.groupby(['Flight Number','STD Departure']).tail(10)
yatu
  • 86,083
  • 12
  • 84
  • 139