Python: How to remove rows where multiple columns have equal values?

Question

I want to remove rows where multiple columns have the same values. I read this question about two columns and tried to extend to multiple columns, however I get an error.

Here is some sample data, similar to my dataframe:

import pandas as pd
data = [['table1',10,8,7],['table2',3,3,3],['table3',3,8,11],['table4',12,12,12],['table5',13,15,5]]
df = pd.DataFrame(data,columns=['table_name','Attr1','Attr2','Attr3'])

and my desired result

res = [['table1',10,8,7],['table3',3,8,11],['table5',13,15,5]]
result = pd.DataFrame(res,columns=['table_name','Attr1','Attr2','Attr3'])

I tried

[df[df['Attr1'] != df['Attr2'] | df['Attr1'] != df['Attr3'] | df['Attr2'] != df['Attr3']]]

which retrieves the error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Any ideas?

score 3 · Answer 1 · answered Mar 15 '19 at 07:52

3

Use df.query:

df = df.query("Attr1 != Attr2 != Attr3")

answered Mar 15 '19 at 07:52

Loochie

2,414
13
20

score 2 · Accepted Answer · answered Mar 15 '19 at 07:28

2

Use DataFrame.ne for compare all values by Attr1 column and test if at least one True per row by DataFrame.any, last filter by boolean indexing:

df = df[df[['Attr1','Attr2','Attr3']].ne(df['Attr1'], axis=0).any(axis=1)]
print (df)
  table_name  Attr1  Attr2  Attr3
0     table1     10      8      7
2     table3      3      8     11
4     table5     13     15      5

Details:

print (df[['Attr1','Attr2','Attr3']].ne(df['Attr1'], axis=0))
   Attr1  Attr2  Attr3
0  False   True   True
1  False  False  False
2  False   True   True
3  False  False  False
4  False   True   True

print (df[['Attr1','Attr2','Attr3']].ne(df['Attr1'], axis=0).any(axis=1))
0     True
1    False
2     True
3    False
4     True
dtype: bool

Another solution is test number of unique values by DataFrame.nunique:

df = df[df[['Attr1','Attr2','Attr3']].nunique(axis=1).ne(1)]

answered Mar 15 '19 at 07:28

jezrael

822,522
95
1,334
1,252

This works only if `Attr1` is one of the duplicates. Consider the case where Attr1-3 is `[10, 8, 8]` ? – Chris Adams Mar 15 '19 at 07:37
@ChrisA - Then is possible use `df = df[df[['Attr1','Attr2','Attr3']].nunique(axis=1).ne(2)]` – jezrael Mar 15 '19 at 07:38
@ChrisA - So you think get rows if all values are unique? Like `df = df[df[['Attr1','Attr2','Attr3']].nunique(axis=1).eq(3)]` ? – jezrael Mar 15 '19 at 07:42
That should do it yeah – Chris Adams Mar 15 '19 at 07:43

score 2 · Answer 3 · answered Mar 15 '19 at 07:39

You can create conditions for each and then perform your comparison:

c1 = df['Attr1'].ne(df['Attr2'])
c2 = df['Attr1'].ne(df['Attr3'])
c3 = df['Attr2'].ne(df['Attr3'])
>>> df[c1 | c2 | c3]
  table_name  Attr1  Attr2  Attr3
0     table1     10      8      7
2     table3      3      8     11
4     table5     13     15      5

Each condition will be a series indicating whether or not the inequality holds, e.g.

>>> c1
0     True
1    False
2     True
3    False
4     True
dtype: bool

>>> c1 | c2 | c3
0     True
1    False
2     True
3    False
4     True
dtype: bool

score 2 · Answer 4 · answered Mar 15 '19 at 07:51

2

Boolean index with the condition being that the number of unique values across axis 1, must be equal to the width of the DataFrame:

df = df[df.nunique(axis=1).eq(df.shape[1])]

answered Mar 15 '19 at 07:51

Chris Adams

18,389
4
22
39

Python: How to remove rows where multiple columns have equal values?

4 Answers4