Pandas, Multiple Case Statement, ValueError: either both or neither of x and y should be given

Question

Trying to incorporate the following case statement in my code.

d1['newcol'] = np.where((d1['business_name'] == 'HELP')&(d1['level'] == 'Component'), d1['component'], np.where((d1['business_name'] == 'HELP')&(d1['level'] == 'Package'),d1['package'], np.where(d1['business_name'] == 'PHELPS',d1['reporting'])))

Or the case pseudocode would look like:

   newcol=case WHEN ((business_name = 'HELP' AND level = 'Component')) THEN component WHEN ((business_name = 'HELP' AND level = 'Package')) THEN package WHEN (business_name = 'PHELS') THEN reporting END

the error Im getting is:

ValueError: either both or neither of x and y should be given

any help would be appreciated.

You may want to consider using [`numpy.select`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.select.html) as an alternative to chaining `where` statements. — pault, Nov 25 '19 at 20:54
To your specific error- ```np.where``` takes either 1 condition, then it returns Boolean matrix, or 3 (of the same shape). So to fix it- just add `````` part to your ```numpy.where``` (make it ```zeros()``` for instance). — Grzegorz Skibinski, Nov 25 '19 at 21:27
This is 100% `np.select`: see the [accepted answer](https://stackoverflow.com/a/19913845/9081267), the second part where `np.select` is used — Erfan, Nov 26 '19 at 12:15

Brandon · Accepted Answer · 2019-11-25T22:02:15.767

2

You might want to apply a lambda function across all rows of the dataframe.

df['newcol'] = df.apply(lambda row: 
                        row['component'] if row['business_name'] == 'help' and row['level'] == 'Component' # 1st condition
                           else (row['package'] if row['business_name'] == 'HELP' and row['level'] == 'Package' # 2nd condition
                                 else (row['reporting'] if row['business_name'] == 'PHELPS' # 3rd condition
                                       else np.nan) # NAN if no conditions matched
                           )
                       , axis=1)

edited Nov 25 '19 at 22:02

answered Nov 25 '19 at 21:17

Brandon

918
6
14

getting this error `ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')`, going to keep testing. – excelguy Nov 25 '19 at 21:48
Edited. Value error should be fixed now – Brandon Nov 25 '19 at 22:02
This might be sufficient for you but using `apply` will be slower than using `np.select`, significantly so if your DataFrame is large. – pault Nov 26 '19 at 16:43

score 1 · Answer 2 · answered Nov 25 '19 at 21:42

Here is an approach that can achieve the result. Using other sample data.

import pandas as pd
import numpy as np
from io import StringIO

# Sample data
df = pd.read_csv(StringIO(
"""a,b
1,1
1,2
2,1
2,2"""))

# Create new column using conditional indexing
df["c"] = ""
df.loc[df["a"] == 1, "c"] = "one something"
df.loc[(df["a"] == 2) & (df["b"] == 1), "c"] = "two one"
df.loc[(df["a"] == 2) & (df["b"] != 1), "c"] = "two something"
df

# Out[29]: 
#    a  b              c
# 0  1  1  one something
# 1  1  2  one something
# 2  2  1        two one
# 3  2  2  two something

Pandas, Multiple Case Statement, ValueError: either both or neither of x and y should be given

2 Answers2