2

I have created a data frame with the following data

name <- c("A","B","C","D","E","F","G","H","I","J")
age <- c(22,43,12,17,29,5,51,56,9,44)
sex <- c("M","F","M","M","M","F","F","M","F","F")
rock <- data.frame(name,age,sex,stringsAsFactors = TRUE)
rock

Now I want to find out:

If the name is E to J and sex is not equal to F then the status is "1F", if the name is A to D and age is greater than 15 then the status is "Young". Everything else is "Others"

so for that, i am applying following code:

rock$status <- ifelse(rock$name==c("E","F","G","H","I","J")& 
rock$sex!="F","1F",            
ifelse(rock$name==c("E","F","G","H","I","J")&rock$sex=="F","Fenamle",
ifelse(rock$name==c("A","B","C","D") & rock$age>15,"Young","Others")))
rock

But i am getting the output like:

  name  age    sex    status
1     A   22     M   Young   
2     B   43     F   Young   
3     C   12     M  Others  
4     D   17     M  Young   
5     E   29     M  Others  
6     F    5     F  Others  
7     G   51     F  Others  
8     H   56     M  Others 
9     I    9     F  Others  
10    J   44     F  Others  

But, it has to be "1F" on E and H.but it is showing "Others"

What wrong have I done into my code?

Please correct me and also give me some valuable suggestions regarding this.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Sayam Nandy
  • 87
  • 1
  • 2
  • 10
  • Related post: https://stackoverflow.com/questions/42637099/difference-between-the-and-in-operators-in-r – zx8754 Oct 04 '17 at 09:27

6 Answers6

10

We need to use %in% instead of ==:

rock$status <- ifelse(rock$name %in% c("E", "F", "G", "H", "I", "J") & 
                        rock$sex != "F", "1F",            
                      ifelse(rock$name %in% c("E", "F", "G", "H", "I", "J") & 
                               rock$sex == "F", "Female",
                             ifelse(rock$name %in% c("A", "B", "C", "D") &
                                      rock$age > 15, "Young", "Others")))
rock

#    name age sex  status
# 1     A  22   M   Young
# 2     B  43   F   Young
# 3     C  12   M  Others
# 4     D  17   M   Young
# 5     E  29   M      1F
# 6     F   5   F  Female
# 7     G  51   F  Female
# 8     H  56   M      1F
# 9     I   9   F  Female
# 10    J  44   F  Female
zx8754
  • 52,746
  • 12
  • 114
  • 209
5

In cases like this, I often prefer pre-allocating indexes and then indexing the unique values with a summation of these. It is faster and more readable than nested ifelse's (imo). An example:

i1 <- rock$name %in% c("E", "F", "G", "H", "I", "J") & rock$sex != "F"
i2 <- rock$name %in% c("E", "F", "G", "H", "I", "J") & rock$sex == "F"
i3 <- rock$name %in% c("A", "B", "C", "D") & rock$age > 15

rock$status <- c("Other", "1F", "Female", "Young")[1 + i1 + 2*i2 + 3*i3]

which gives the desired result:

> rock
   name age sex status
1     A  22   M  Young
2     B  43   F  Young
3     C  12   M  Other
4     D  17   M  Young
5     E  29   M     1F
6     F   5   F Female
7     G  51   F Female
8     H  56   M     1F
9     I   9   F Female
10    J  44   F Female
Jaap
  • 81,064
  • 34
  • 182
  • 193
3

For the sake of completeness, here is also a solution using joins and non-equi joins to update the status column:

library(data.table)
setDT(rock)[.(name = LETTERS[1:4], age = 15), on = .(name, age > age), status := "Young"][
  .(name = LETTERS[5:10], sex = "F"), on = .(name, sex), status := "Female"][
    .(name = LETTERS[5:10], status = NA_character_), on = .(name, status), status := "1F"][
      .(status = NA_character_), on = .(status), status := "Other"][]
    name age sex status
 1:    A  22   M  Young
 2:    B  43   F  Young
 3:    C  12   M  Other
 4:    D  17   M  Young
 5:    E  29   M     1F
 6:    F   5   F Female
 7:    G  51   F Female
 8:    H  56   M     1F
 9:    I   9   F Female
10:    J  44   F Female

Unfortunately, non-equi joins do not work with unequal operators !=, yet. So,

setDT(rock)[.(name = LETTERS[1:4], age = 15), on = .(name, age > age), status := "Young"][
  .(name = LETTERS[5:10], sex = "F"), on = .(name, sex != sex), status := "1F"][]

gives an error message. Instead, I had to join on name and sex first to set status to Female and then to check for NAs in status to get the complimentary set.

However, there is another workaround using two non-equi joins instead :

setDT(rock)[.(name = LETTERS[1:4], age = 15), on = .(name, age > age), status := "Young"][
  .(name = LETTERS[5:10], sex = "F"), on = .(name, sex < sex), status := "1F"][
    .(name = LETTERS[5:10], sex = "F"), on = .(name, sex > sex), status := "1F"][]
Uwe
  • 41,420
  • 11
  • 90
  • 134
2

With data.table you can do:

library(data.table)
rock <- data.table(rock)
rock[name %in% LETTERS[5:10] & sex != "F", status := "1F"]
rock[name %in% LETTERS[1:4] & age > 15, status := "Young"]
rock[is.na(status), status := "Other"]
rock
#     name age sex status
#  1:    A  22   M  Young
#  2:    B  43   F  Young
#  3:    C  12   M  Other
#  4:    D  17   M  Young
#  5:    E  29   M     1F
#  6:    F   5   F  Other
#  7:    G  51   F  Other
#  8:    H  56   M     1F
#  9:    I   9   F  Other
# 10:    J  44   F  Other
Uwe
  • 41,420
  • 11
  • 90
  • 134
guscht
  • 843
  • 4
  • 20
2

A solution using dplyr's case_when() function:

library(dplyr)

name <- c("A","B","C","D","E","F","G","H","I","J")
age <- c(22,43,12,17,29,5,51,56,9,44)
sex <- c("M","F","M","M","M","F","F","M","F","F")
rock <- data.frame(name,age,sex,stringsAsFactors = TRUE)

name_condition_1 <- c("E","F","G","H","I","J")
name_condition_2 <- c("A","B","C","D")

rock %>% mutate(
  status = case_when(
    name %in% name_condition_1 & sex != "F" ~ "1F",
    name %in% name_condition_1 & sex == "F" ~ "Female",
    name %in% name_condition_2 & age >  15  ~ "Young",
    TRUE ~ "Others"
  )
)

producing:

   name age sex status
1     A  22   M  Young
2     B  43   F  Young
3     C  12   M Others
4     D  17   M  Young
5     E  29   M     1F
6     F   5   F Female
7     G  51   F Female
8     H  56   M     1F
9     I   9   F Female
10    J  44   F Female
Davis Vaughan
  • 2,780
  • 9
  • 19
-1
data$status <- ifelse(data$name %in% c("A", "B", "C", "D") & data$age > 15,"Young",ifelse(data$sex != "F" & data$name %in% c("E", "F", "G", "H", "I", "J"),"1F","Others"))
data
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
  • 3
    While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Suraj Rao Jun 18 '19 at 14:49