1

For example, I have the DataFrame:

a = [{'column_1': 'A', 'column_2': 'B', 'column_3': 20.14}, {'column_1': 'A', 'column_2': 'B', 'column_3': 20.35}]
df = pd.DataFrame(a)

I need to drop the duplicate using two columns -> df.drop_duplicate(['column_1', 'column_2']) but use the following conditions.

First I need to compare the value in the df['column_3'] column, and keep the entry that is lower in value, in this case 20.14

enter image description here

There may be more than two duplicates in a real table.

LiAfe
  • 95
  • 6

2 Answers2

2

Sort dataframe first using sort_values, then drop_duplicates, keeping the first (lowest value column_3) record.

Option 1

df.sort_values(['column_3']).drop_duplicates(['column_1', 'column_2'])

Another way, capturing more than one minimum record:

Option 2

df[df['column_3'] == df.groupby(['column_1', 'column_2'])['column_3'].transform('min')]

or just want one record:

Option 3

df.groupby(['column_1', 'column_2'], as_index=False)['column_3'].min()
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • thanks for answer, ok, it's work, but how to do this doesn't use the sort_values, can it be done in some other way? – LiAfe Jan 18 '23 at 18:49
2

You can use groupby on 'column_1', 'column_2' and then find min on column_3.

df.groupby(['column_1', 'column_2'])['column_3'].min().to_frame().reset_index()

Output:

  column_1 column_2  column_3
0        A        B     20.14
I'mahdi
  • 23,382
  • 5
  • 22
  • 30