searching similar columns names in multiple dataframe

Question

I have multiple datasets which has same columns name as below example, I want the columns which are repeated in multiple datasets sort out in list format using python and pandas.

df1 = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
               'B': 'one one two three two two one three'.split(),
               'C': np.arange(8), 
               'D': np.arange(8) * 2})
df2 = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
               'B': 'one one two three two two one three'.split(),
               'C': np.arange(8)})
df3 = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
               'B': 'one one two three two two one three'.split(),
               'D': np.arange(8) * 2})

As from above we can see in three Datasets df1, df2, df3 has repeated columns as 'A', 'B' and the output as ['A', 'B'] Please can give solution to this problem. Thanks in Advance

score 0 · Answer 1 · answered Mar 07 '18 at 08:36

Pandas columns are of type pandas.core.indexes.base.Index you could use the intersection function in them to find the overlapping elements. Here is an example below

import pandas as pd
import numpy as np

a = np.arange(1,4)
b = np.arange(5,8)
c = np.random.randint(0,10,size=3)
d = np.random.randint(0,10,size=3)
df_1 = pd.DataFrame({'a':a,'b':b,'c':c,'d':d})

out:

    a   b   c   d
0   1   5   5   1
1   2   6   7   5
2   3   7   6   9

a = np.arange(4,7)
b = np.arange(7,10)
e = np.random.randint(0,10,size=3)
f = np.random.randint(0,10,size=3)
df_2 = pd.DataFrame({'a':a,'b':b,'e':c,'f':d})
df_2

out:

    a   b   e   f
0   4   7   9   9
1   5   8   9   3
2   6   9   2   1

df_1.columns.intersection(df_2.columns)

out:

Index(['a', 'b'], dtype='object')

type(df_1.columns)

out:

pandas.core.indexes.base.Index

score 0 · Accepted Answer · answered Mar 07 '18 at 08:37

0

Pandas can get list of column names for you. For example,df1.columns will return ['A','B','C','D']. Likewise you can get the list of column names for each dataframe.

Then you can just find out the intersection of all these lists.

answered Mar 07 '18 at 08:37

Shridhar R Kulkarni

6,653
3
37
57

score 0 · Answer 3 · answered Mar 07 '18 at 08:45

0

I think simpliest is & for intersection of all columns names:

a = df1.columns & df2.columns & df3.columns
print (a)
Index(['A', 'B'], dtype='object')

If need list:

a = (df1.columns & df2.columns & df3.columns).tolist()
print (a)
['A', 'B']

answered Mar 07 '18 at 08:45

jezrael

822,522
95
1,334
1,252

searching similar columns names in multiple dataframe

3 Answers3