8

l want to compare the values of two columns where I create a new column bin_crnn. I want 1 if they are equals or 0 if not.

# coding: utf-8
import pandas as pd

df = pd.read_csv('file.csv',sep=',')

if df['crnn_pred']==df['manual_raw_value']:
    df['bin_crnn']=1
else:
    df['bin_crnn']=0

l got the following error

    if df['crnn_pred']==df['manual_raw_value']:
  File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/pandas/core/generic.py", line 917, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
MSeifert
  • 145,886
  • 38
  • 333
  • 352
vincent75
  • 453
  • 1
  • 7
  • 16

5 Answers5

10

You need cast boolean mask to int with astype:

df['bin_crnn'] = (df['crnn_pred']==df['manual_raw_value']).astype(int)

Sample:

df = pd.DataFrame({'crnn_pred':[1,2,5], 'manual_raw_value':[1,8,5]})
print (df)
   crnn_pred  manual_raw_value
0          1                 1
1          2                 8
2          5                 5

print (df['crnn_pred']==df['manual_raw_value'])
0     True
1    False
2     True
dtype: bool

df['bin_crnn'] = (df['crnn_pred']==df['manual_raw_value']).astype(int)
print (df)
   crnn_pred  manual_raw_value  bin_crnn
0          1                 1         1
1          2                 8         0
2          5                 5         1

You get error, because if compare columns output is not scalar, but Series (array) of True and False values.

So need all or any for return scalar True or False.

I think better it explain this answer.

Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
10

One fast approach is to use np.where.

import numpy as np
df['test'] = np.where(df['crnn_pred']==df['manual_raw_value'], 1, 0)
Allen Qin
  • 19,507
  • 8
  • 51
  • 67
  • Nice approach @Allen. Good Q/A here regarding that approach and the benefits of it vs. a list comprehension depending on data set size: http://stackoverflow.com/q/19913659/6163621 – elPastor May 19 '17 at 12:34
  • I thought similar questions must have been asked and answered before here. I will take a look at that thread. @pshep123 – Allen Qin May 19 '17 at 12:36
0

No need for a loop or if statement, just need to set a new column using a boolean mask.

df['bin_crnn'].loc[df['crnn_pred']==df['manual_raw_value']] = 1
df['bin_crnn'].fillna(0, inplace = True) 
elPastor
  • 8,435
  • 11
  • 53
  • 81
0

Another quick way just using Pandas and not Numpy is

df['columns_are_equal'] = df.apply(lambda x: int(x['column_a'] ==x['column_b']), axis=1)
Michael Discenza
  • 3,240
  • 7
  • 30
  • 41
-2

You are comparing 2 columns, try this..

bin_crnn = []
for index, row in df.iterrows():
    if row['crnn_pred'] == row['manual_raw_value']:
        bin_crnn.append(1)
    else:
        bin_crnn.append(0)
df['bin_crnn'] = bin_crnn
Ika8
  • 391
  • 1
  • 12