Python Dataframe - Removing Duplicate Values?

Question

I have a program that runs completely through but unfortunately is returning duplicates because of the way the base data is structured. Results looks like this:

   Date      Amount   Source   Type
  7/16/2019  10        A       B
  7/17/2019  10        A       B
  7/15/2019  10        A       B
  7/15/2019  10        B       B

I'd like to return:
   Date      Amount   Source   Type
  7/17/2019   10        A       B
  7/15/2019   10        B       B

7/17/2019 is chosen because its the last date we received 10 from source A and Type B.

I've tried:

df.drop_duplicates(subset='a','b','date', keep="last")

but its not quite working. Is there a better way to do this?

This worked

df[df.Date.eq(df.groupby(['Source','Type'])['Date'].transform('max'))]

`df[df.Date.eq(df.groupby(['Source','Type'])['Date'].transform('max'))]` — anky, Jul 16 '19 at 14:29
Both throw this error which I haven't dealt with all that much: valueError: cannot reindex from a duplicate axis — user2679225, Jul 16 '19 at 14:33
@user2679225 df[df.Date.eq(df.groupby(['Source','Type'])['Date'].transform('max').values)] — BENY, Jul 16 '19 at 14:34

score 1 · Accepted Answer · answered Jul 16 '19 at 14:38

As stated in This post:

non_duplicate_index = ~df.index.duplicated(keep='first')
result = df.loc[non_duplicate_index]

df.index.duplicated(keep='first') Returns an index containing True/False values. True if the value is duplicated, and False otherwise. Then, ~df.index.duplicated(keep='first') returns True where the value is not duplicated.

Finally df.loc[non_duplicate_index] is an slicing pandas method which returns the df rows where non_duplicate_index is True.

score 1 · Answer 2 · answered Jul 16 '19 at 14:39

1

drop_duplicates will work as well

df.sort_values('Date').drop_duplicates(subset=['Source','Type'], keep="last") 
Out[566]: 
        Date  Amount Source Type
3 2019-07-15      10      B    B
1 2019-07-17      10      A    B

answered Jul 16 '19 at 14:39

BENY

317,841
20
164
234

Python Dataframe - Removing Duplicate Values?

2 Answers2