5

I'd like to change the value of an entry in a Dataframe given a condition. For instance:

d = pandas.read_csv('output.az.txt', names = varname)
d['uld'] = (d.trade - d.plg25)*(d.final - d.price25)

if d['uld'] > 0:
   d['uld'] = 1
else:
   d['uld'] = 0

I'm not understanding why the above doesn't work. Thank you for your help.

EdChum
  • 376,765
  • 198
  • 813
  • 562
James Eaves
  • 1,587
  • 3
  • 17
  • 22

1 Answers1

12

Use np.where to set your data based on a simple boolean criteria:

In [3]:

df = pd.DataFrame({'uld':np.random.randn(10)})
df
Out[3]:
        uld
0  0.939662
1 -0.009132
2 -0.209096
3 -0.502926
4  0.587249
5  0.375806
6 -0.140995
7  0.002854
8 -0.875326
9  0.148876
In [4]:

df['uld'] = np.where(df['uld'] > 0, 1, 0)
df
Out[4]:
   uld
0    1
1    0
2    0
3    0
4    1
5    1
6    0
7    1
8    0
9    1

As for why what you did failed:

In [7]:

if df['uld'] > 0:
   df['uld'] = 1
else:
   df['uld'] = 0
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-ec7d7aaa1c28> in <module>()
----> 1 if df['uld'] > 0:
      2    df['uld'] = 1
      3 else:
      4    df['uld'] = 0

C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
    696         raise ValueError("The truth value of a {0} is ambiguous. "
    697                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 698                          .format(self.__class__.__name__))
    699 
    700     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So the error is that you are trying to evaluate an array with True or False which becomes ambiguous because there are multiple values to compare hence the error. In this situation you can't really use the recommended any, all etc. as you are wanting to mask your df and only set the values where the condition is met, there is an explanation on the pandas site about this: http://pandas.pydata.org/pandas-docs/dev/gotchas.html and a related question here: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

np.where takes a boolean condition as the first param, if that is true it'll return the second param, otherwise if false it returns the third param as you want.

UPDATE

Having looked at this again you can convert the boolean Series to an int by casting using astype:

In [23]:
df['uld'] = (df['uld'] > 0).astype(int)
df

Out[23]:
   uld
0    1
1    0
2    0
3    0
4    1
5    1
6    0
7    1
8    0
9    1
Community
  • 1
  • 1
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 2
    Thank you so much. I'm really amazed I can ask a question and receive such a detailed answer just like that. What a great experience. Thanks again! – James Eaves Mar 11 '15 at 00:33