Drop duplicates by combinations of two columns

Asked Jul 05 '23 at 17:56

Active Jul 05 '23 at 17:56

Viewed 17 times

I have

import pandas as pd
df = pd.DataFrame({"A":[1,5,2,7,8],
                   "B":[5,1,7,2,9]})
#  A  B
#  1  5
#  5  1
#  2  7
#  7  2
#  8  9

1-5 should be considered as a duplicate of 5-1, and 2-7 to 7-2 and dropped to create:

The way I'm solving it now is to create a list of A and B, sort it and convert to string, then drop duplicates. There must be some easier and more effectivate way?

df["C"]=df[["A","B"]].values.tolist()
df["C"]=df.apply(lambda x: ','.join([str(y) for y in sorted(x["C"])]), axis=1)
#    A  B    C
# 0  1  5  1,5
# 1  5  1  1,5
# 2  2  7  2,7
# 3  7  2  2,7
# 4  8  9  8,9

df = df.drop_duplicates(subset="C")
#    A  B    C
# 0  1  5  1,5
# 2  2  7  2,7
# 4  8  9  8,9

asked Jul 05 '23 at 17:56

BERA

1,345
3
16
36

Drop duplicates by combinations of two columns

0 Answers0