Remove rows from dataframe if A column value is in C column

Question

I have this code:

test = {"number": ['1555','1666','1777', '1888'],
        "order_amount": ['100.00','200.00','-200.00', '300.00'],
        "number_of_refund": ['','','1666', '']
    }

df = pd.DataFrame(test)

Which returns the following dataframe:

  number order_amount number_of_refund
0   1555       100.00                 
1   1666       200.00                 
2   1777      -200.00             1666
3   1888       300.00

What would be the best solution to remove the row if the order number is refunded? I would want to remove the order row and the refund row.

Logic if df['number'].value is in df['number_of_refund'] and the amount of df['number'].value is the opposite of the df['number_of_refund'] rows.

So the result in this case should be:

number order_amount number_of_refund
0   1555       100.00                 
1   1888       300.00

Use `df = df[~df['number'].isin(df['number_of_refund']) & ~df['number_of_refund'].isin(df['number'])]` — jezrael, Sep 08 '22 at 09:06

mozway · Accepted Answer · 2022-09-08T09:31:40.753

You can use boolean indexing with two masks:

# is the row a refund?
m1 = df['number_of_refund'].ne('')
# is the number order matching a refund?
m2 = df['number'].isin(df.loc[m1, 'number_of_refund'])

# keep rows that do not match any of the above
out = df[~(m1|m2)]

output:

  number  order_amount number_of_refund
0   1555         100.0                 
3   1888         300.0

partial refunds:

df['order_amount'] = pd.to_numeric(df['order_amount'], errors='coerce')

# is the row a refund?
m1 = df['number_of_refund'].ne('')
# get mapping of refunds
s = df[m1].set_index('number_of_refund')['order_amount']
# update amounts
df['order_amount'] += df['number'].map(s).fillna(0)
# find null values
m2 = df['order_amount'].eq(0)


# keep rows that do not match any mask
df = df[~(m1|m2)]

output:

  number  order_amount number_of_refund
0   1555         100.0                 
1   1666         100.0                 
3   1888         300.0

input for partial refund:

test = {"number": ['1555','1666','1777', '1888'],
        "order_amount": ['100.00','200.00','-100.00', '300.00'],
        "number_of_refund": ['','','1666', '']
    }

df = pd.DataFrame(test)

Here I used a mask to remove null values. But yes, you get the general logic, then you can do whatever you want ;) — mozway, Sep 08 '22 at 10:00
I tried using the partial code with more dataframe data and I keep getting the following error: df['order_amount'] += df['number'].map(s).fillna(0) InvalidIndexError(self._requires_unique_msg) pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects By the way I am trying to use this in a loop with different dataframes. — Kirilas, Sep 12 '22 at 07:38
I think there should be a df.reset_index(inplace=True, drop=True) step. — Kirilas, Sep 12 '22 at 08:22

Remove rows from dataframe if A column value is in C column

1 Answers1

partial refunds: