0

I am facing a problem in nested ifelse() statement in R. I have a dataframe which has a column Age. I have to encode the data by following condition-

  1. If Age <=18, then Age=child
  2. If Age >18 and <=60, then Age=adult
  3. If Age >60, then Age=senior.

I used following code to solve the problem

ifelse((Titanic$Age <= 18),Titanic$Age <-'child',ifelse((Titanic$Age>18 & Titanic$Age<=60),Titanic$Age <- 'adult',Titanic$Age <- 'senior'))

The problem I am facing is that it turns all rows in age column 'senior' even through most values range around 20-40

bummi
  • 27,123
  • 14
  • 62
  • 101
Rahul Sar
  • 21
  • 1
  • `ifelse` returns the value to be assigned; don't use `<-` inside `ifelse`. `Titanic$Binned_age <- ifelse(Titanic$Age <= 18, 'child', ifelse(Titanic$Age <= 60, 'adult', 'senior')))`. Or better yet, use `cut` as in the duplicate. – Gregor Thomas Oct 08 '19 at 13:44

4 Answers4

1

Better to keep the original data and add a category colum next to the ages One direct answer with no package is as follow:

Titanic$categeory <- with(Titanic, ifelse(Age<18,yes = "child2",no = ifelse(Age<=60,yes = "Adult",no = "Senior")))
Rlearner
  • 93
  • 5
0

You can use case_when from dplyr. It allows to vectorize multiple if_else statements:

library(dplyr)

set.seed(111)
df <- data.frame(Age = runif(100, 0, 90))

df <- df %>% mutate(Age = case_when(Age <= 18 ~ "child 2",
                                    Age > 18 & Age <= 60 ~ "adult 3",
                                    TRUE ~ "senior"))

If you need Age to be a factor variable, convert it:

df <- df %>% mutate(Age = as.factor(Age))
slava-kohut
  • 4,203
  • 1
  • 7
  • 24
0

Since you have numeric values you can use cut and then rename the levels. This uses only base functions.

# some dummy data
dummy <- data.frame(age = runif(100, 0,100))

# actual code: 
# cut the data based on the thresholds. Look into the documentation to see whether the sets borders are included to the left or the right.
dummy$agebracket <- cut(dummy$age, breaks = c(0,18,60,9999))
# now we just rename them to our liking
levels(dummy$agebracket) <- c("child 1", "child 2", "senior")

As a comment to your code: The bug is that you overwrite the whole vector with lines like this

Titanic$Age <- 'senior'

You'd want to do something closer to this

Titanic$agebracket <- 
ifelse((Titanic$Age <= 18), 'child',
  ifelse((Titanic$Age>18 & Titanic$Age<=60),'adult', 'senior'))

But I would try to stay away from these nested ifs if you can. They are hard to read and might not work in more complex situations.

user3631934
  • 31
  • 1
  • 3
0

To explain why your code doesn't work: when you do

ifelse(
    Titanic$Age <= 18, 
    Titanic$Age <-'child', 
    ifelse(...) 
)

if it goes in the if part of ifelse it will assignchild to all rows because you are executing the statement Titanic$Age <-'child'. In your example it goes in the last ifelse and assigns senior to all rows.

What you want instead is

ifelse(
    Titanic$Age <= 18, 
    'child', 
    ifelse(...)
)

But after a few nested ifelse statements this becomes very hard to read so I recommend case_when from dplyr like @slava-kohut showed in his answer.

konvas
  • 14,126
  • 2
  • 40
  • 46