1

Is there any way to delete rows of duplicated pairs in pandas without taking the order into account?

Dataframe before deleting --> want to delete duplicate pair (yellow colored)

enter image description here

After deleting duplication

enter image description here

example data:

df = pd.DataFrame({'a': [1,2,1,1,2,2],
                   'b': [2,1,3,4,3,4]
                  })
mozway
  • 194,879
  • 13
  • 39
  • 75
puhuk
  • 464
  • 5
  • 15

1 Answers1

5

You can generate a frozenset to have a common, unordered item to groupby, then take the first item per group:

df.groupby(df.apply(frozenset, axis=1), as_index=False).first()

or use duplicated on the frozenset Series:

df[~df.apply(frozenset, axis=1).duplicated()]

output:

   a  b
0  1  2
1  1  3
2  1  4
3  2  3
4  2  4
mozway
  • 194,879
  • 13
  • 39
  • 75