How to remove rows from Pandas dataframe where column 'A' in one row equals column 'B' in another row?

Question

I am trying to clean my pandas data frame from duplicate rows. I know how to remove rows where column values are the same but I'm not sure how to do that if (column A on row 1 is equal to column B on row 2 and column B on row 1 is equal to column A on row 2). I hope it is not too confusing. I've added an example of a table below. I would consider rows 2 and 3 to be duplicates. How would I remove them using pandas?

Edit:

Duplicate rows are not necessarily right above or below each other. I need to keep only one of those rows (doesn't matter which one specifically).

is it absolute that to delete rows if A2 = B3 or A3=B2 (i.e if rows from A is one row above/below only) or matching value can in any rows? It is not clear from example — martian_rover, Oct 16 '21 at 18:49

Dani Mesejo · Accepted Answer · 2021-10-16T19:03:02.193

Use np.sort to make each row have the same values in order

import pandas as pd
import numpy as np

# toy data
df = pd.DataFrame(data=[[10, 15], [15, 10]], columns=["A", "B"])

# find duplicates rows
duplicated = pd.DataFrame(np.sort(df[["A", "B"]], axis=1), index=df.index).duplicated()

# filter out 
res = df[duplicated]
print(res)

Output

    A   B
1  15  10

Alternative use frozenset to convert each row into a hashable set where order does not matter.

# find duplicates rows
duplicated = df[["A", "B"]].apply(frozenset, axis=1).duplicated()

# filter out
res = df[duplicated]
print(res)

How to remove rows from Pandas dataframe where column 'A' in one row equals column 'B' in another row?

1 Answers1