1

I would like to recategorize the AGE variable value into groups (AGE_RECAT). For example assigning AGE_RECAT as "ONE" when the AGE value is between 1 and 5.

I made the function as:

def numeric_recat(df, var, condition_dict):
    # return a pandas column
    # condition_dict includes return value (key) and return condition (start numeric value, end numeric value)
    # i.e., {'Group 1': [1, 2], 'Group 2': [2, 3]}
    for key, value in condition_dict.items():
        if (df[var] >= value[0]) & (df[var] <= value[1]):
            return (str(key))
        else: return np.nan

And try to call it as:

df_pc['AGE_RECAT'] = vc.numeric_recat(df_pc, 'AGE', condition_dict=
    {'One': [1, 5],
    'Two': [6, 10],
    'Three': [11, 64],
    'Four': [65, 300]
    })

But getting this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Edit to address the duplicate suggestion:

While the overall goal is the same, the answer in the similar question does not apply to the specific format of conditions I need. My preferred condition format is a single dictionary, while the answer requires two lists (bins and names).

KubiK888
  • 4,377
  • 14
  • 61
  • 115
  • You can use pandas cut, something like pd.cut(df['AGE'], bins = [1,5,10,64,300],labels=['One', 'Two', 'Three','Four']) – Vaishali Aug 28 '18 at 22:50

0 Answers0