0

For example if I have a large dataframe of all individuals in a zoo and two columns are Animal_Common_Name and Animal_Scientific_Name. I suspect one of those is redundant as one characteristic is totally determined by the other and viceversa. Basically are the same charasteristic but renamed.

Is there any fuction that selected two different columns tell you so?

Floralys
  • 1
  • 1
  • 3
    Any sample data ? – Surjit Samra Dec 29 '22 at 20:55
  • The current answers cover the same values under different column names, but when I first read the question, I thought you meant *different* values in the *same pattern*, like for example `pd.DataFrame({'A': [1, 2, 1], 'B': [5, 6, 5]})`. Could you [edit] to clarify? For specifics, check out [How to make good reproducible pandas examples](/q/20109391/4518341). BTW, welcome to Stack Overflow! Check out the [tour] and [How to ask a good question](/help/how-to-ask) for general tips. – wjandrea Dec 29 '22 at 21:20

4 Answers4

1

Assuming this example:

  Animal_Common_Name  Animal_Scientific_Name
0               Lion            Panthera leo
1            Giraffe  Giraffa camelopardalis
2               Lion            Panthera leo

Use factorize to convert to a categorical integer, then compare is all values are equal:

(pd.factorize(df['Animal_Common_Name'])[0] == pd.factorize(df['Animal_Scientific_Name'])[0]).all()

Output: True

If you want to identify multiple relationships:

df[df.groupby('Animal_Scientific_Name')['Animal_Common_Name'].transform('nunique').ne(1)]

And the same with the column names swapped.

mozway
  • 194,879
  • 13
  • 39
  • 75
0

You can use the pandas.Series.equals() method.

For example:

import pandas as pd

data = {
    'Column1': [1, 2, 3, 4],
    'Column2': [1, 2, 3, 4],
    'Column3': [5, 6, 7, 8]
}

df = pd.DataFrame(data)

# True
print(df['Column1'].equals(df['Column2']))

# False
print(df['Column1'].equals(df['Column3']))

Found via GeeksForGeeks

wjandrea
  • 28,235
  • 9
  • 60
  • 81
tehCheat
  • 333
  • 2
  • 8
0
df['Animal_Common_Name'].equals(df['Animal_Scientific_Name'])

This should return True if they're the same and false if not.

Niamh
  • 51
  • 5
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 30 '22 at 07:45
0

You can use the vectorized operations of pandas to quickly determine your redundancies. Here's an example:

import pandas as pd

# create a sample dataframe from some data
d = {'name1': ['Zebra', 'Lion', 'Seagull', 'Spider'],
     'name2': ['Zebra', 'Lion', 'Bird', 'Insect']}
df = pd.DataFrame(data=d)

# create a new column for your test:
df['is_redundant'] = ''

# select your empty column where the redundancy exists:
df['is_redundant'][df['name1']==df['name2']] = 1

print(df)


    name1   name2   is_redundant
0   Zebra   Zebra   1
1   Lion    Lion    1
2   Seagull Bird    
3   Spider  Insect  

You can then replace the empties with 0 or leave as is depending on your application.

wjandrea
  • 28,235
  • 9
  • 60
  • 81