0

I have a problem turning these intervals into a new column with the categories I want.

I have tried many different variations of the greater than. I did get it to work, by getting the middle bracket to become NaN and then rename them afterwards. The code works fine for line 1 & 3, it is just when I want to create the middle interval that it does not work. I would be forever grateful for any help.

df["AgeGroup"] = df.loc[df["Age"] < 25, "AgeGroup"] = "kid"
df["AgeGroup"] = df.loc[df["Age"] >= 25 & df.loc["Age"] < 50, "AgeGroup"] = "young"
df["AgeGroup"] = df.loc[df["Age"] >= 50, "AgeGroup"] = "old"

Also tried and similar variations. inbetween.

df["AgeGroup"] = df.loc[df["Age"] < 25, "AgeGroup"] = "kid"
df["AgeGroup"] = df.loc[df["Age"] >= 25 < 50, "AgeGroup"] = "young"
df["AgeGroup"] = df.loc[df["Age"] >= 50, "AgeGroup"] = "old"

Varies between syntax errors and The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

rpanai
  • 12,515
  • 2
  • 42
  • 64
  • You have to use parenthesis `( df.Age >= 25 ) & (df.Age < 50)` or use `gt`, `le` etc.. like `df.Age.ge(25) & df.Age.lt(50)` – rafaelc Oct 10 '19 at 14:20

4 Answers4

2

You might use pd.cut as in the following example:

import pandas as pd
df = pd.DataFrame({"Age": [25, 40, 51, 4, 90]})
bins = [0, 24, 49, 200]
labels = ["kid", "young", "old"]
df["AgeGroup"] = pd.cut(df['Age'], bins=bins, labels=labels)
rpanai
  • 12,515
  • 2
  • 42
  • 64
1

Use np.select

cond=[df["Age"] < 25,(df["Age"] >= 25) & (df["Age"] < 50),df["Age"] >= 50]
val=['kid','young','old']
df["AgeGroup"]=np.select(cond,val)

Also you can use:

df.loc[df["Age"] < 25, "AgeGroup"] = "kid"
df.loc[(df["Age"] >= 25 )&(df["Age"] < 50), "AgeGroup"] = "young"
df.loc[df["Age"] >= 50, "AgeGroup"] = "old"

it is important to use parentheses:

(df["Age"] >= 25 )&(df["Age"] < 50)
ansev
  • 30,322
  • 5
  • 17
  • 31
0

Another way is to define a function and then use df.map()

def age_group(age):
    if age < 25: ageGroup = "kid"
    elif (age >= 25) and (age < 50): ageGroup = "young"
    else: ageGroup = "old"
    return ageGroup

df['AgeGroup'] = df['Age'].map(age_group)
Niels Henkens
  • 2,553
  • 1
  • 12
  • 27
0

I've answered a very similar question before here. The best way to do this is using numpy.digitize().

classes = np.array(['kid', 'young', 'old'])
df['AgeGroup'] = classes[np.digitize(df['Age'], [25, 50])]

I just saw the answer using pd.cut(), this is basically the numpy equivalent.

Rob
  • 3,418
  • 1
  • 19
  • 27