Is there any way in a Python dataframe to see if two columns are the same but with renamed values?

Question

For example if I have a large dataframe of all individuals in a zoo and two columns are Animal_Common_Name and Animal_Scientific_Name. I suspect one of those is redundant as one characteristic is totally determined by the other and viceversa. Basically are the same charasteristic but renamed.

Is there any fuction that selected two different columns tell you so?

The current answers cover the same values under different column names, but when I first read the question, I thought you meant *different* values in the *same pattern*, like for example `pd.DataFrame({'A': [1, 2, 1], 'B': [5, 6, 5]})`. Could you [edit] to clarify? For specifics, check out [How to make good reproducible pandas examples](/q/20109391/4518341). BTW, welcome to Stack Overflow! Check out the [tour] and [How to ask a good question](/help/how-to-ask) for general tips. — wjandrea, Dec 29 '22 at 21:20

score 1 · Answer 1 · answered Dec 30 '22 at 07:09

Assuming this example:

  Animal_Common_Name  Animal_Scientific_Name
0               Lion            Panthera leo
1            Giraffe  Giraffa camelopardalis
2               Lion            Panthera leo

Use factorize to convert to a categorical integer, then compare is all values are equal:

(pd.factorize(df['Animal_Common_Name'])[0] == pd.factorize(df['Animal_Scientific_Name'])[0]).all()

Output: True

If you want to identify multiple relationships:

df[df.groupby('Animal_Scientific_Name')['Animal_Common_Name'].transform('nunique').ne(1)]

And the same with the column names swapped.

score 0 · Answer 2 · edited Dec 29 '22 at 21:10

0

You can use the pandas.Series.equals() method.

For example:

import pandas as pd

data = {
    'Column1': [1, 2, 3, 4],
    'Column2': [1, 2, 3, 4],
    'Column3': [5, 6, 7, 8]
}

df = pd.DataFrame(data)

# True
print(df['Column1'].equals(df['Column2']))

# False
print(df['Column1'].equals(df['Column3']))

Found via GeeksForGeeks

edited Dec 29 '22 at 21:10

wjandrea

28,235
9
60
81

answered Dec 29 '22 at 20:54

tehCheat

333
2
8

score 0 · Answer 3 · edited Dec 29 '22 at 20:57

0

df['Animal_Common_Name'].equals(df['Animal_Scientific_Name'])

This should return True if they're the same and false if not.

edited Dec 29 '22 at 20:57

Bhargav - Retarded Skills

3,154
1
6
22

answered Dec 29 '22 at 20:56

Niamh

51
5

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 30 '22 at 07:45

score 0 · Answer 4 · edited Dec 29 '22 at 21:14

You can use the vectorized operations of pandas to quickly determine your redundancies. Here's an example:

import pandas as pd

# create a sample dataframe from some data
d = {'name1': ['Zebra', 'Lion', 'Seagull', 'Spider'],
     'name2': ['Zebra', 'Lion', 'Bird', 'Insect']}
df = pd.DataFrame(data=d)

# create a new column for your test:
df['is_redundant'] = ''

# select your empty column where the redundancy exists:
df['is_redundant'][df['name1']==df['name2']] = 1

print(df)


    name1   name2   is_redundant
0   Zebra   Zebra   1
1   Lion    Lion    1
2   Seagull Bird    
3   Spider  Insect

You can then replace the empties with 0 or leave as is depending on your application.

Is there any way in a Python dataframe to see if two columns are the same but with renamed values?

4 Answers4