4

I have the same code,I'm trying to create new field in pandas dataframe with simple conditions:

if df_reader['email1_b']=='NaN':
    df_reader['email1_fin']=df_reader['email1_a']
else:
    df_reader['email1_fin']=df_reader['email1_b']

But I see this strange mistake:

ValueError                                Traceback (most recent call last)
<ipython-input-92-46d604271768> in <module>()
----> 1 if df_reader['email1_b']=='NaN':
      2     df_reader['email1_fin']=df_reader['email1_a']
      3 else:
      4     df_reader['email1_fin']=df_reader['email1_b']

/home/user/GL-env_py-gcc4.8.5/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
    953         raise ValueError("The truth value of a {0} is ambiguous. "
    954                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955                          .format(self.__class__.__name__))
    956 
    957     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can anybody explain me, what I need to with this?

EdChum
  • 376,765
  • 198
  • 813
  • 562

2 Answers2

5

df_reader['email1_b']=='NaN' is a vector of Boolean values (one per row), but you need one Boolean value for if to work. Use this instead:

df_reader['email1_fin'] = np.where(df_reader['email1_b']=='NaN', 
                                   df_reader['email1_a'],
                                   df_reader['email1_b'])

As a side note, are you sure about 'NaN'? Is it not NaN? In the latter case, your expression should be:

df_reader['email1_fin'] = np.where(df_reader['email1_b'].isnull(), 
                                   df_reader['email1_a'],
                                   df_reader['email1_b'])
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • Yes, You are absolutely right. The second code block solves my mistake. Thanks! –  Aug 22 '17 at 08:02
1

if expects a scalar value to be returned, it doesn't understand an array of booleans which is what is returned by your conditions. If you think about it what should it do if a single value in this array is False/True?

to do this properly you can do the following:

df_reader['email1_fin'] = np.where(df_reader['email1_b'] == 'NaN', df_reader['email1_a'], df_reader['email1_b'] )

also you seem to be comparing against the str 'NaN' rather than the numerical NaN is this intended?

EdChum
  • 376,765
  • 198
  • 813
  • 562