Multiple conditions Pandas groupby, keeping other column values

Question

I have a dataframe like this:

Launch  Article Sequence    Machine     Quantity    Date        …
68033   F2500   10          lathe 1     200         01/02/2022  …
68033   F2500   20          lathe 1     190         01/02/2022  …
68033   F2500   30          borer 3     175         02/02/2022  …
68033   F2500   40          milling 1   175         03/03/2022  …
71562   F2500   10          lathe 3     632         12/12/2022  …
71562   F2500   20          lathe 4     593         15/12/2022  …
71562   F2500   30          borer 3     560         16/12/2022  …
71562   F2500   40          milling 2   555         16/12/2022  …
69872   F302    10          lathe 2     5463        04/06/2022  …
69872   F302    30          lathe 3     5102        11/06/2022  …
70444   F302    20          lathe 1     3125        27/07/2022  …
70444   F302    30          lathe 3     2965        31/07/2022  …
…       …       …           …           …           …           …

124.531 rows x 12 columns

What i need to do is a some kind of group by where, for each article i select the maximum launch number, and after that, the minimum sequence number with its relative machine.

The end result should look like this:

Article Launch  Sequence    Machine
F2500   71562   10          lathe 3
F302    70444   20          lathe 1
…       …       …           …

I've tried to do it with pandas groupby with .agg, but it doesn't work. The following code, for example, gives me the max launch and min sequence overall and not the min sequence related to the max launch. I've tried some other approaches with sort_values and such, but with no success.

Last_Lathe_df = Last_Lathe_df.groupby(['Article'], as_index=False).agg({'Launch': 'max', 'Sequence': 'min', 'Machine': 'first'})

score 2 · Accepted Answer · answered Mar 15 '23 at 10:29

2

I would use:

# get max Launch per Article and filter rows
m = df.groupby('Article')['Launch'].max()
df2 = df.loc[df['Launch'].isin(m)]

# get rows with min sequence
Last_Lathe_df = df2.loc[df2.groupby('Article')['Sequence'].idxmin()]

Output:

    Launch Article  Sequence  Machine  Quantity        Date
4    71562   F2500        10  lathe 3       632  12/12/2022
10   70444    F302        20  lathe 1      3125  27/07/2022

answered Mar 15 '23 at 10:29

mozway

194,879
13
39
75

1

Just out of curiosity, how did you set up a test dataframe containing the question's input data so fast? :) Did you manually convert it to CSV? – filpa Mar 15 '23 at 10:31
1

@filpa [here is my usual way](https://stackoverflow.com/a/73814257), in this particular case `read_clipboard` was enough ;) – mozway Mar 15 '23 at 10:32
1

Thank you for your answer! Simple and easy to understand, worked like a charm :) – Piazza Mar 15 '23 at 13:42

score 2 · Answer 2 · answered Mar 15 '23 at 10:44

In straightforward way:

df.groupby('Article').apply(lambda x: x[x['Launch'].eq(x['Launch'].max())]
                            .sort_values(by=['Sequence']).head(1)).reset_index(drop=True)

  Launch Article  Sequence  Machine  Quantity        Date
0   71562   F2500        10  lathe 3       632  12/12/2022
1   70444    F302        20  lathe 1      3125  27/07/2022

Multiple conditions Pandas groupby, keeping other column values

2 Answers2