Remove rows that two columns have the same values by pandas

Question

Input：

    S   T   W      U
0   A   A   1   Undirected
1   A   B   0   Undirected
2   A   C   1   Undirected
3   B   A   0   Undirected
4   B   B   1   Undirected
5   B   C   1   Undirected
6   C   A   1   Undirected
7   C   B   1   Undirected
8   C   C   1   Undirected

Output：

    S   T   W      U
1   A   B   0   Undirected
2   A   C   1   Undirected
3   B   A   0   Undirected
5   B   C   1   Undirected
6   C   A   1   Undirected
7   C   B   1   Undirected

For column S and T ,rows(0,4,8) have same values. I want to drop these rows.

Trying:

I used df.drop_duplicates(['S','T'] but failed, how could I get the results.

score 53 · Accepted Answer · answered May 13 '17 at 09:39

53

You need boolean indexing:

print (df['S'] != df['T'])
0    False
1     True
2     True
3     True
4    False
5     True
6     True
7     True
8    False
dtype: bool

df = df[df['S'] != df['T']]
print (df)
   S  T  W           U
1  A  B  0  Undirected
2  A  C  1  Undirected
3  B  A  0  Undirected
5  B  C  1  Undirected
6  C  A  1  Undirected
7  C  B  1  Undirected

Or query:

df = df.query("S != T")
print (df)
   S  T  W           U
1  A  B  0  Undirected
2  A  C  1  Undirected
3  B  A  0  Undirected
5  B  C  1  Undirected
6  C  A  1  Undirected
7  C  B  1  Undirected

answered May 13 '17 at 09:39

jezrael

822,522
95
1,334
1,252

How would you do this if you wanted to pass a list of column names instead of explicitly calling out S & T? – Joe Rivera Mar 14 '20 at 00:05
4

@JoeRivera - then use `L = ['S','T'] df = df[df[L].ne(df[L[0]], axis=0).any(axis=1)]` - compare all columns by first col of list and test if not equal at least one value by [`DataFrame.any`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.any.html) – jezrael Mar 14 '20 at 04:34

score 0 · Answer 2 · answered Dec 22 '21 at 06:16

We can achieve in this way also. Generally, I use this method to do the same.

For Example :

    import pandas as pd
    #creating temp df for example 

    details = {
        'Name' : ['Ankit', 'Aishwarya', 'Shaurya', 'Shivangi', 'Priya', 'Swapnil'],
        'Nick_Name' : ['Ankit', 'Aish', 'Shaurya', 'Shiv', 'Priya', 'Lucky'],
    }
      
    # creating a Dataframe object 
    df = pd.DataFrame(details, columns = ['Name', 'Nick_Name',],index = ['a', 'b', 'c', 'd', 'e', 'f'])
      
    
    index_names = df[ (df['Name'] == df['Nick_Name'])].index
    
    df.drop(index_names, inplace = True)
    print(df)

Remove rows that two columns have the same values by pandas

2 Answers2

Linked

Related