-2

I am using a CSV file as input to and generate JSON format file to feed into kafka topic

df = pd.read_csv(csv_file, delimiter=",",
                 dtype={'E': 'S10', 'C': 'S10', 'Date': 'S10', 'TimeCode': 'S10', 
                         'Workrule': 'S10'})

common.time_calc(df) #time_calc is the function from a
df = df.drop(['Workrule'], axis=1)

On the function I have

def time_calc(df_entry):
    if (df_entry['TimeCode'] == 'R') and (df_entry['Workrule'] == 'C'):
        df_entry['TimeCode'] = 'A'
    if df_entry['TimeCode'] in ['O', 'L']:
        df_entry['TimeCode'] = 'O'

and I am getting

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have tried modifying the code as

if (df_entry['TimeCode'] == 'R') & (df_entry['Workrule'] == 'C'):
        df_entry['TimeCode'] = 'A'

but still get the same error.

Added the following and was able to post now. thanks!

json_data = df.to_json(orient='records')
json_input = '{"value":' + json_data + '}'
decodedJson = json.loads(json_input) 
for entry in decodedJson['value']:
 common.time_calc(entry)
 del entry['Workrule']
Serotonin
  • 5
  • 3

2 Answers2

0

Your function time_calc takes a DataFrame as argument. In the part df_entry['TimeCode'] == 'R', you actually compute a series as you compare the entire column to a scalar value.

When you use logical and to this, python tries to compute the boolean equivalent of a Series, which raises the exception. What you actually intend to do is either use vector operations or loop over the rows.

An example of the fixed code can be (not tested):

def time_calc(df):
    df.loc[df['TimeCode'] == 'R' & df['Workrule'] == 'C', 'TimeCode'] = 'A'
    df.loc[df['TimeCode'].isin(['O', 'L']), 'TimeCode'] = 'O'
crazyGamer
  • 1,119
  • 9
  • 16
0

You are comparing entire column with a single value df_entry['TimeCode'] =='R'. You need to iterate row by row to compare the single column value or better use np.where

def time_calc(df_entry):
    df_entry['TimeCode'] = np.where((df_entry['TimeCode'] == 'R') and (df_entry['Workrule'] == 'C'), 'A', df_entry['TimeCode'])
    df_entry['TimeCode'] = np.where(df_entry['TimeCode'] in ['O','L'], 'O', df_entry['TimeCode'])
Sociopath
  • 13,068
  • 19
  • 47
  • 75