How to create a function in pandas that'll drop a row based on multiple conditions?

Question

I have a DataFrame that has been put together using different source files, and in some instances there are duplicated entries in the files, yet the amounts differ.

The below table is a representation of the data I am dealing with

index	name	recipient	amount	file	year
0	company a	recipient a	1003	File A	2020
1	company a	recipient a	1000	File B	2020
2	company b	recipient b	500	File A	2018
3	company a	recipient a	760	File A	2017

In the above table, rows at index 0 & 1 are duplicates, and I want to drop the row at index 1

I haven't been able to get very far with writing any code, though I have an idea sketched out. It may not be very efficient, as the idea centers on looping through the rows.

At each row, if the following conditions are met:

Row 0 (recipient) = Row 1 (recipient)
Row 0 (year) = Row 1 (year)
Row 0 (file) != Row 1 (file)
(Row 0 (Amount) - Row 1 (Amount)) / Row 0 (amount) < 0.05

Then drop Row 1

The data will be sorted so that the check will always be done for the row below.

What is an effective way to apply the above logic?

**Update

I've tried to replicate the answer from Pandas compare next row based on conditions

recipient = df['recipient']
amount = df['amount']
file = df['file']
year = df['year']

for i in range(len(recipient)-1):
    print(i)
    if(recipient[i] == recipient[i+1] and year[i] == year[i+1] and file[i] != file[i+1] and ((amount[i] - amount[i+1])/amount[i]) < 0.05):
        del recipient[i]
        del amount[i]
        del file[i]
        del year[i]
        

new_df = pd.DataFrame({'recipient': recipient, 'amount': amount, 'file': file, 'year': year})

Except I am getting a KeyError: 10 error message

Please provide enough code so others can better understand or reproduce the problem. — Community, Nov 15 '22 at 12:20

How to create a function in pandas that'll drop a row based on multiple conditions?

0 Answers0