0

I have a data frame that contains two columns I want to remove all duplicated regardless of the order

col1     col2
 A        B
 B        A
 C        D
 E        F
 F        E

The output should be

col1       col2
 A           B
 C           D
 E           F

I have tried using the duplicate function but it did not remove anything because they are not in the same order

1 Answers1

1

One way:

  1. Take the inner numpy array and sort it.
  2. Use the dataframe constructor to recreate the dataframe(sorted by row).
  3. Drop the duplicates.
df = pd.DataFrame(np.sort(df.values), columns = df.columns).drop_duplicates()

OUTPUT:

 col1 col2
0    A    B
2    C    D
3    E    F
Nk03
  • 14,699
  • 2
  • 8
  • 22