3

I have one dataframe like below -

df1_data = {'sym' :{0:'AAA',1:'BBB',2:'CCC',3:'DDD',4:'DDD',5:'CCC'},
        'id' :{0:'101',1:'102',2:'103',3:'104',4:'105',5:'106'},
        'sal':{0:'1000',1:'1000',2:'1000',3:'1000',4:'1000',5:'1000'},
        'loc':{0:'zzz',1:'zzz',2:'zzz',3:'zzz',4:'zzz',5:'zzz'},
        'name':{0:'abc',1:'abc',2:'abc',3:'pqr',4:'pqr',5:'pqr'}}
df = pd.DataFrame(df1_data)
print df

    id  loc name   sal  sym
0  101  zzz  abc  1000  AAA
1  102  zzz  abc  1000  BBB
2  103  zzz  abc  1000  CCC
3  104  zzz  pqr  1000  DDD
4  105  zzz  pqr  1000  DDD
5  106  zzz  pqr  1000  CCC

I want to check which columns of above dataframe contains same values in all rows. On the basis of that requirement I want these same columns in one dataframe and unmatched columns in another dataframe.

Expected output -

matched_df -

   loc   sal
0  zzz  1000
1  zzz  1000
2  zzz  1000
3  zzz  1000
4  zzz  1000
5  zzz  1000

unmatched_df -

    id name  sym
0  101  abc  AAA
1  102  abc  BBB
2  103  abc  CCC
3  104  pqr  DDD
4  105  pqr  DDD
5  106  pqr  CCC
ketan
  • 2,732
  • 11
  • 34
  • 80

1 Answers1

3

You can compare df with first row by eq and then check all True values by all:

print (df.eq(df.iloc[0]))
      id   loc   name   sal    sym
0   True  True   True  True   True
1  False  True   True  True  False
2  False  True   True  True  False
3  False  True  False  True  False
4  False  True  False  True  False
5  False  True  False  True  False

mask = df.eq(df.iloc[0]).all()
print (mask)
id      False
loc      True
name    False
sal      True
sym     False
dtype: bool
print (df.loc[:, mask])
   loc   sal
0  zzz  1000
1  zzz  1000
2  zzz  1000
3  zzz  1000
4  zzz  1000
5  zzz  1000

print (df.loc[:, ~mask])
    id name  sym
0  101  abc  AAA
1  102  abc  BBB
2  103  abc  CCC
3  104  pqr  DDD
4  105  pqr  DDD
5  106  pqr  CCC

Another way for mask is compare numpy arrays:

arr = df.values
mask = (arr == arr[0]).all(axis=0)
print (mask)
[False  True False  True False]

print (df.loc[:, mask])
   loc   sal
0  zzz  1000
1  zzz  1000
2  zzz  1000
3  zzz  1000
4  zzz  1000
5  zzz  1000

print (df.loc[:, ~mask])
    id name  sym
0  101  abc  AAA
1  102  abc  BBB
2  103  abc  CCC
3  104  pqr  DDD
4  105  pqr  DDD
5  106  pqr  CCC
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Fantastic... I never did row comparison of dataframe but it is amazing and very quick...Thanks... – ketan Feb 07 '17 at 09:37
  • @jezrael- facing one future warning - /usr/local/lib/python2.7/dist-packages/pandas/core/ops.py:1247: FutureWarning: numpy equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change. result = op(x, y)...can I avoid this warning or it is important? – ketan Feb 07 '17 at 09:40
  • I think it is numpy warning and you can see it because [link](http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#fine-grained-numpy-errstate). And in my opinion it is no problem. – jezrael Feb 07 '17 at 09:43
  • @jazreal- ok. how I can avoid this warning? – ketan Feb 07 '17 at 09:47
  • @kit - hard question, I dont know. But I am not sure if is good idea block warning. Because in pandas there is very often new solution. – jezrael Feb 07 '17 at 09:56
  • I try to find some solution - you can cach warning by [this answer](http://stackoverflow.com/q/15933741/2901002). But I use python 3 and last version of pandas 0.19.2 and I have no warning. Is possible upgrade your pandas and numpy? – jezrael Feb 07 '17 at 10:06
  • @jezrael- I'm using pandas 0.19.2 and numpy 1.12.0 but python 2.7.6. In that answer Warning message is coming but in my FutureWarning. tried to handle but not getting success. – ketan Feb 07 '17 at 11:24
  • I found [this](http://stackoverflow.com/a/15778297/2901002). It can help. – jezrael Feb 07 '17 at 11:29