Little logic issue with Panda duplicates on python

Question

If I have a panda df with following entries:

type    location    unit     condition
Car        01        19          2
Car        01        19          3
Car        01        19          1
Car        02        19          1
Car        01        20          1
Machine    05        09          1
Machine    05        09          2
Machine    05        09          3
Machine    15        09          1
Machine    15        10          1
Truck      02        09          2
Truck      02        09          1

For the duplicates (based on type | location | unit) I would like to only get the ones with the best condition. I tried several sort_value duplicates etc. but I think I have a logic issue in my head.

so the perfect result would look like this:

type    location    unit     condition
Car        01        19          3
Car        02        19          1
Car        01        20          1
Machine    05        09          3
Machine    15        09          1
Machine    15        10          1
Truck      02        09          2

Commentary on proposed duplicate

Sorry for the mistake: I totally forgot (type, location, unit and condition) are not the only columns there. There is at least two more currently (Torque and Wheels) which I would need also to have access to.

So I guess it is not possible to solve with the answers to this question.

Current full df

type    location    unit     condition      wheels      torque
Car        01        19          2            4          256
Car        01        19          3            4          320
Car        01        19          1            4          190
Car        02        19          1            4          280
Car        01        20          1            4          400
Machine    05        09          1            4          320
Machine    05        09          2            6          690
Machine    05        09          3            12        1180
Machine    15        09          1            4          290
Machine    15        10          1            6          445
Truck      02        09          2            6          625
Truck      02        09          1            8          804

@jezrael: this post has been edited. I have no view on it, but would you take a look to see if the proposed duplicate does not fit, as per the OP's view? — halfer, Apr 03 '20 at 09:58
@halfer - Good idea. I still guess dupe is correct, but rather ask by comment — jezrael, Apr 03 '20 at 10:00
I think you need `df.loc[df.groupby(['type', 'location', 'unit'])['condition'].idxmax()]`, if not working can you add expected output? — jezrael, Apr 03 '20 at 10:01

score 1 · Answer 1 · answered Apr 02 '20 at 10:31

1

You can use groupby and retain just the maximum condition:

df = df.groupby(['type', 'location', 'unit'])['condition'].max().reset_index()

answered Apr 02 '20 at 10:31

Bruno Mello

4,448
1
9
39

Okay, that seams to filter out what I wanted. But little issue is that I have two more columns which are not in the df I posted above which I lose when I do the groupby. Or am I missing something? – YvesNr1 Apr 02 '20 at 11:42
What are those columns? You have to define how you want to aggregate them – Bruno Mello Apr 02 '20 at 11:42
there are two more colums called "wheels" and "torque" – YvesNr1 Apr 02 '20 at 11:53
They are unique by each ['type', 'location', 'unit']? – Bruno Mello Apr 02 '20 at 11:54
Well they are all over the place no duplicates at all at least for torque. So they should not be grouped at all. Its more like I just should still be able to access them so I can use them in my for loop. Right now I can not access them anymore – YvesNr1 Apr 02 '20 at 11:56
So you can't get the condition max because it returns a smaller dataframe and thus you will lose your information about the torque and well. You could get something like max_torque and max_wheel if that is what you want. – Bruno Mello Apr 02 '20 at 11:59

Little logic issue with Panda duplicates on python

Commentary on proposed duplicate

1 Answers1