1

I have a dataframe that looks something like that below:

    col1    col2
0   abc      0
1   def     -1
2   ghi      1
3   jkl    -0.5

repro:

data = {'col1':  ['abc', 'def','ghi','jkl'],
        'col2': ['0', '-1','1','-0.5']
        }

df = pd.DataFrame (data, columns = ['col1','col2'])

I'd like to add a third column, the contents of whichare based on conditional evaluation of col2 so the result is as follows:

    col1    col2    col3
0   abc      0      blue
1   def     -1      red
2   ghi      1      green
3   jkl    -0.5     red

My current code is this:

df['col3'] = np.where((df['col2'] >=1,'green',
                       (df['col2'] ==0, 'blue',
                         (df['col2'] <0, 'red'))))

However, this currently fails with the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-151-a19019bc0d01> in <module>
      1 df['col3'] = np.where((df['col2'] >=1,'green',
      2                        (df['col2'] ==0, 'blue',
----> 3                          (df['col2'] <0, 'red'))))

//anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
   1476         raise ValueError("The truth value of a {0} is ambiguous. "
   1477                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1478                          .format(self.__class__.__name__))
   1479 
   1480     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Could I please ask you to explain the error and suggest how I can achieve me end goal?

Thanks

jimiclapton
  • 775
  • 3
  • 14
  • 42
  • 2
    You aren't following the syntax of `where`. – hpaulj Jul 16 '20 at 22:39
  • Does this answer your question? [Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()](https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o) – Trenton McKinney Jul 16 '20 at 22:47

2 Answers2

3
In [64]: data = {'col1':  ['abc', 'def','ghi','jkl'],
    ...:         'col2': ['0', '-1','1','-0.5']
    ...:         }
    ...:
    ...: df = pd.DataFrame (data, columns = ['col1','col2'])

In [65]: df["col2"] = df["col2"].astype(float)

In [66]: def process(row):
    ...:     col2 = row["col2"]
    ...:     if col2 >=1: return "green"
    ...:     if col2 ==0: return "blue"
    ...:     if col2<0: return "red"
    ...:

In [67]: df["col3"] = df.apply(process,axis=1)

In [68]: df
Out[68]:
  col1  col2   col3
0  abc   0.0   blue
1  def  -1.0    red
2  ghi   1.0  green
3  jkl  -0.5    red
bigbounty
  • 16,526
  • 5
  • 37
  • 65
3

You can use np.select:

import numpy as np
import pandas as pd

data = {'col1': ['abc', 'def','ghi','jkl'],
        'col2': ['0', '-1','1','-0.5']
        }

df = pd.DataFrame (data, columns = ['col1','col2'])
df['col2'] = df['col2'].astype(float)

condlist = [df['col2'] >=1., df['col2'] ==0, df['col2'] <0]
choicelist = ['green', 'blue', 'red']
df['col3'] = np.select(condlist, choicelist)

which gives:

>>> df
>>>     col1  col2   col3
>>>   0  abc   0.0   blue
>>>   1  def  -1.0    red
>>>   2  ghi   1.0  green
>>>   3  jkl  -0.5    red
Dharman
  • 30,962
  • 25
  • 85
  • 135
Yacola
  • 2,873
  • 1
  • 10
  • 27
  • Thanks so much for your answer. I see how it works but I think the function and apply() approach is probably slightly more efficient for my use case. – jimiclapton Jul 17 '20 at 07:54