1

Instead of having Age in numbers, I need to group them by certain age groups that get substituted on the data frame

import pandas as pd
# intialise data of lists. 
data = {'Name':['Tom', 'nick', 'krish', 'jack','Ann','James'], 
        'Age':[20, 21, 45, 58,34,60]} 
  
# Create DataFrame 
df = pd.DataFrame(data)

This is what I tried:

if df['Age'] < 20:
    df['Age']= df['Age'].replace([<20],'<20')

if df['Age'] >= 20 & >40:
    df['Age']= df['Age'].replace([>=20&<40],'>=20&<40')

if df['Age'] >=40:
    df['Age']= df['Age'].replace([>=40],'>=40')
smci
  • 32,567
  • 20
  • 113
  • 146
  • 2
    `df['Age2'] = pd.cut(df['Age'], bins=[-np.inf, 20, 40, np.inf], labels=['<20', '20-40', '>=40'], right=False) ` will do it in a single line. – cs95 Jul 07 '20 at 21:23
  • 1
    thank you kind stranger, thats it ! – Tiago Emanuel Pratas Jul 07 '20 at 21:24
  • 1
    Oops, forgot to add `right=False` param to my previous comment. But that will do it. Please consider upvoting the answer in the duplicate post if it helped. – cs95 Jul 07 '20 at 21:26

1 Answers1

1

use np.select(setofconditions, matchingchoices)

import numpy as np
c1=df['Age'] < 20
c2=df['Age'].between(20,40)
c3=df['Age'] >=40
cond=[c1,c2,c3]
choice=['<20','>=20&<40','>=40']
df['agerange']=np.select(cond,choice)

     Name  Age  agerange
0    Tom   20  >=20&<40
1   nick   21  >=20&<40
2  krish   45      >=40
3   jack   58      >=40
4    Ann   34  >=20&<40
5  James   60      >=40
wwnde
  • 26,119
  • 6
  • 18
  • 32