I would like to recategorize the AGE variable value into groups (AGE_RECAT). For example assigning AGE_RECAT as "ONE" when the AGE value is between 1 and 5.
I made the function as:
def numeric_recat(df, var, condition_dict):
# return a pandas column
# condition_dict includes return value (key) and return condition (start numeric value, end numeric value)
# i.e., {'Group 1': [1, 2], 'Group 2': [2, 3]}
for key, value in condition_dict.items():
if (df[var] >= value[0]) & (df[var] <= value[1]):
return (str(key))
else: return np.nan
And try to call it as:
df_pc['AGE_RECAT'] = vc.numeric_recat(df_pc, 'AGE', condition_dict=
{'One': [1, 5],
'Two': [6, 10],
'Three': [11, 64],
'Four': [65, 300]
})
But getting this error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Edit to address the duplicate suggestion:
While the overall goal is the same, the answer in the similar question does not apply to the specific format of conditions I need. My preferred condition format is a single dictionary, while the answer requires two lists (bins and names).