2
import pandas as pd
df1 = pd.DataFrame(index=[1,2,3,4])


df1['A'] = [1,2,5,4]
df1['B'] = [5,6,9,8]
df1['C'] = [9,10,1,12]

>>> df1
   A  B   C
1  1  5   9
2  2  6  10
3  5  9   1
4  4  8  12

I want to compare rows of df1 and get a result of row1(1,5,9) == row3(5,9,1).

It means I care only contained items of row and ignore order of items of row.

sh kim
  • 183
  • 1
  • 8

2 Answers2

2

I think need sorting each row by np.sort:

df2 = pd.DataFrame(np.sort(df1.values, axis=1), index=df1.index, columns=df1.columns)
print (df2)
   A  B   C
1  1  5   9
2  2  6  10
3  1  5   9
4  4  8  12

And then remove duplicates by inverted (~) boolean mask created by duplicated:

df2 = pd.DataFrame(np.sort(df1.values, axis=1), index=df1.index)
print (df2)
   0  1   2
1  1  5   9
2  2  6  10
3  1  5   9
4  4  8  12

df1 = df1[~df2.duplicated()]
print (df1)
   A  B   C
1  1  5   9
2  2  6  10
4  4  8  12
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

If no value is present twice in a columnm you could just simply translate your columnns into a set

row1 = df.iloc[1]
row3 = df.iloc[3] 
set(row1) == set(row3)

it has the advantage that you can then compare your columns, e.g to find if there is a value in one and not the other.

row1 - row3 # find the values that are in row1 but not in row3
LoicM
  • 1,786
  • 16
  • 37