ValueError:. Use a.empty, a.bool(), a.item(), a.any() or a.all(). while calling a function

Question

I am using a CSV file as input to and generate JSON format file to feed into kafka topic

df = pd.read_csv(csv_file, delimiter=",",
                 dtype={'E': 'S10', 'C': 'S10', 'Date': 'S10', 'TimeCode': 'S10', 
                         'Workrule': 'S10'})

common.time_calc(df) #time_calc is the function from a
df = df.drop(['Workrule'], axis=1)

On the function I have

def time_calc(df_entry):
    if (df_entry['TimeCode'] == 'R') and (df_entry['Workrule'] == 'C'):
        df_entry['TimeCode'] = 'A'
    if df_entry['TimeCode'] in ['O', 'L']:
        df_entry['TimeCode'] = 'O'

and I am getting

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have tried modifying the code as

if (df_entry['TimeCode'] == 'R') & (df_entry['Workrule'] == 'C'):
        df_entry['TimeCode'] = 'A'

but still get the same error.

Added the following and was able to post now. thanks!

json_data = df.to_json(orient='records')
json_input = '{"value":' + json_data + '}'
decodedJson = json.loads(json_input) 
for entry in decodedJson['value']:
 common.time_calc(entry)
 del entry['Workrule']

yes, i checked the existing question. That is why I tried the boolean operator first before posting — Serotonin, Jan 24 '19 at 05:58
OK, just checking, it was not mentioned anywhere in your question. — cs95, Jan 24 '19 at 05:59

crazyGamer · Answer 1 · 2019-01-24T05:15:50.700

Your function time_calc takes a DataFrame as argument. In the part df_entry['TimeCode'] == 'R', you actually compute a series as you compare the entire column to a scalar value.

When you use logical and to this, python tries to compute the boolean equivalent of a Series, which raises the exception. What you actually intend to do is either use vector operations or loop over the rows.

An example of the fixed code can be (not tested):

def time_calc(df):
    df.loc[df['TimeCode'] == 'R' & df['Workrule'] == 'C', 'TimeCode'] = 'A'
    df.loc[df['TimeCode'].isin(['O', 'L']), 'TimeCode'] = 'O'

score 0 · Answer 2 · answered Jan 24 '19 at 05:10

You are comparing entire column with a single value df_entry['TimeCode'] =='R'. You need to iterate row by row to compare the single column value or better use np.where

def time_calc(df_entry):
    df_entry['TimeCode'] = np.where((df_entry['TimeCode'] == 'R') and (df_entry['Workrule'] == 'C'), 'A', df_entry['TimeCode'])
    df_entry['TimeCode'] = np.where(df_entry['TimeCode'] in ['O','L'], 'O', df_entry['TimeCode'])

ValueError:. Use a.empty, a.bool(), a.item(), a.any() or a.all(). while calling a function

2 Answers2