Fastest way to create column in pandas based on multiple conditions

Question

I'm currently using this function:

def age_groupf(row):
    if row['Age'] <= 19:
        val = '15-19'
    elif row['Age'] <= 24:
        val = '20-24'
    elif row['Age'] <= 29:
        val = '25-29'
    elif row['Age'] <= 34:
        val = '30-34'
    elif row['Age'] <= 39:
        val = '35-39'
    elif row['Age'] <= 44:
        val = '40-44'
    elif row['Age'] <= 49:
        val = '45-49'
    elif row['Age'] <= 54:
        val = '50-54'
    elif row['Age'] <= 59:
        val = '55-59'
    else:
        val = '60 and more'
    return val

to generate AGE-GROUP fields by calling:

DF['AGE-GROUP'] = DF.apply(age_groupf, axis=1)

seems like it's working but it's slow. I have multiple 100MB TXT files and I need this to be faster.

check `pd.cut` or `np.select` – rafaelc Jul 30 '19 at 12:08 — rafaelc, Jul 30 '19 at 12:08

score 2 · Accepted Answer · answered Jul 30 '19 at 12:17

Use pandas.cut with defined bins and labels.

For example:

bins = [15, 20, 25, 30, 35, 40, 45, 50, 55, 60, np.inf]
labels = [f'{x}-{y-1}' if y!=np.inf else f'{x} and more' for x, y in zip(bins[::], bins[1::])]

pd.cut(df['Age'], bins=bins, labels=labels)

Fastest way to create column in pandas based on multiple conditions

1 Answers1