0

I've this DF containing 3 variables:

# Generate df
set.seed(101)
df <- data.frame("phd" = sample(c("yes", "no"), 100, replace = TRUE),
                 "age" = sample(c(23:45), 100, replace = TRUE),
                 "gre" = sample(c(130:170), 100, replace = TRUE))

I need to compute the following algorithm: enter image description here

I proposed this (uncomplete) code:

# IF cycle 
if (df$phd == "no") {

      df$phd.status = ifelse(df$gre <151, "No PhD low score", ifelse(df$gre >151, "No PhD high score")

} else (df$phd == "yes") {

  df$phd.status = ifelse(df$age <30, ifelse(df$gre <151, "PhD 30yr low score"))

}

I'm getting troubles in writing this code. I'm referring to several posts

Borexino
  • 802
  • 8
  • 26
  • 1
    So what's the trouble? – user2974951 Dec 19 '18 at 12:10
  • Why not use just one big `ifelse` rather than a combination of `if{ } else {}` and `ifelse()`? Also, putting it inside a `with()` could improve readability. – John Coleman Dec 19 '18 at 12:14
  • You can set it up like this: df$state = ifelse(df$phd == "no", "check GRE", "check 30"). If you print df after this statement you see that df has an extra column with content dependent on the value of phd. – Robbert Raats Dec 19 '18 at 12:18

5 Answers5

3

I think you are overcomplicating (or over-conceptualising) this. The logic doesn't require branching tree it's just 3 choices pasted together in one answer. Here is a tidy verse version that does it in one step - 3 choices.

library(dplyr)
df2<- df %>% mutate(phd.status = paste(if_else(phd =="yes", "PhD", "No_PhD"), 
                                       if_else(age < 30, "30yr", ""), 
                                       if_else(gre < 151 , "low score", "high score") ))

head(df2)
  phd age gre          phd.status
1 yes  39 132      PhD  low score
2 yes  34 166     PhD  high score
3 yes  32 153     PhD  high score
4 yes  33 132      PhD  low score
5 yes  43 132      PhD  low score
6 yes  27 169 PhD 30yr high score
Stephen Henderson
  • 6,340
  • 3
  • 27
  • 33
  • Good observation about simplifying the underlying logic (+1) – John Coleman Dec 19 '18 at 12:33
  • The example would be a generalization of neasted `ifelse()` problem. You correctly addressed the solution using `paste()` however the proposed flow diagram was not followed. – Borexino Dec 19 '18 at 12:53
1

Something like this should work:

df$phd.status <- with(df, ifelse(phd == "yes",
    ifelse(age < 30,
        ifelse(gre < 151, "PhD 30yr low score", "PhD 30yr high score"),
        ifelse(gre < 151, "PhD 30yr low score", "PhD 30yr high score")),
    ifelse(gre < 151, "No PhD low score", "No PhD high score")))
John Coleman
  • 51,337
  • 7
  • 54
  • 119
1

When strictly following your diagram:

df$phd.state = ifelse(df$phd == "no", # Did you get Ph.d?
                   # No
                   ifelse(df$gre < 151, # GRE < 151?
                        #Yes
                        "No PhD low score",
                        #No
                        "No PhD high score"
                   ),
                   # Yes
                   ifelse(df$age < 30, # < 30 Yr?
                        #Yes
                        ifelse(df$gre < 151, # GRE < 151?
                            #Yes
                            "PhD 30yr low score",
                            #No
                            "PhD 30yr high score"
                        ),
                        #No
                        ifelse(df$gre < 151, # GRE < 151?
                            #Yes
                            "PhD +30yr low score",
                            #No
                            "PhD +30yr high score"
                        )
                    )
                )
Robbert Raats
  • 259
  • 3
  • 18
1

First ifelse() statement might easily change each variable as your picture. According to the algorithm, the order does not matter, so just mutate(variable = ifelse()) is reasonable.

If you want the output in the algorithm you gave, you can use tidyr::unite() after mutate(). By sep = " ", there will be a space between the three after unite.

library(tidyverse)
df %>% # your data
  mutate( # each ifelse
    phd = ifelse(phd == "yes", "PhD", "No PhD"),
    age = ifelse(age < 30, "30yr", "+30yr"),
    gre = ifelse(gre < 151, "low score", "high score")
  ) %>% 
  unite(col = status, sep = " ") # unite all three column to new status column
#>                      status
#> 1       PhD 30yr high score
#> 2       PhD 30yr high score
#> 3    No PhD +30yr low score
#> 4    No PhD +30yr low score
#> 5      PhD +30yr high score
#> 6       PhD +30yr low score
#> 7    No PhD 30yr high score
#> 8        PhD 30yr low score
#> 9   No PhD +30yr high score
#> 10   No PhD +30yr low score
#> 11    No PhD 30yr low score
#> 12   No PhD 30yr high score
#> 13  No PhD +30yr high score
#> 14   No PhD +30yr low score
#> 15      PhD +30yr low score
#> 16  No PhD +30yr high score
#> 17   No PhD +30yr low score
#> 18     PhD +30yr high score
#> 19     PhD +30yr high score
#> 20      PhD +30yr low score
#> 21   No PhD +30yr low score
#> 22    No PhD 30yr low score
#> 23       PhD 30yr low score
#> 24   No PhD +30yr low score
#> 25    No PhD 30yr low score
#> 26   No PhD +30yr low score
#> 27     PhD +30yr high score
#> 28      PhD +30yr low score
#> 29     PhD +30yr high score
#> 30   No PhD 30yr high score
#> 31      PhD +30yr low score
#> 32      PhD +30yr low score
#> 33       PhD 30yr low score
#> 34      PhD 30yr high score
#> 35   No PhD +30yr low score
#> 36  No PhD +30yr high score
#> 37      PhD +30yr low score
#> 38   No PhD +30yr low score
#> 39     PhD +30yr high score
#> 40   No PhD +30yr low score
#> 41      PhD +30yr low score
#> 42      PhD 30yr high score
#> 43   No PhD +30yr low score
#> 44      PhD +30yr low score
#> 45      PhD +30yr low score
#> 46       PhD 30yr low score
#> 47      PhD +30yr low score
#> 48   No PhD +30yr low score
#> 49   No PhD 30yr high score
#> 50      PhD +30yr low score
#> 51       PhD 30yr low score
#> 52      PhD +30yr low score
#> 53   No PhD +30yr low score
#> 54  No PhD +30yr high score
#> 55       PhD 30yr low score
#> 56   No PhD 30yr high score
#> 57  No PhD +30yr high score
#> 58  No PhD +30yr high score
#> 59   No PhD +30yr low score
#> 60   No PhD +30yr low score
#> 61   No PhD +30yr low score
#> 62  No PhD +30yr high score
#> 63  No PhD +30yr high score
#> 64   No PhD +30yr low score
#> 65   No PhD +30yr low score
#> 66  No PhD +30yr high score
#> 67       PhD 30yr low score
#> 68     PhD +30yr high score
#> 69   No PhD 30yr high score
#> 70  No PhD +30yr high score
#> 71      PhD +30yr low score
#> 72  No PhD +30yr high score
#> 73  No PhD +30yr high score
#> 74   No PhD +30yr low score
#> 75  No PhD +30yr high score
#> 76      PhD +30yr low score
#> 77     PhD +30yr high score
#> 78     PhD +30yr high score
#> 79   No PhD +30yr low score
#> 80   No PhD +30yr low score
#> 81   No PhD +30yr low score
#> 82   No PhD +30yr low score
#> 83   No PhD +30yr low score
#> 84    No PhD 30yr low score
#> 85      PhD +30yr low score
#> 86    No PhD 30yr low score
#> 87      PhD 30yr high score
#> 88      PhD 30yr high score
#> 89     PhD +30yr high score
#> 90       PhD 30yr low score
#> 91    No PhD 30yr low score
#> 92   No PhD 30yr high score
#> 93  No PhD +30yr high score
#> 94      PhD 30yr high score
#> 95      PhD +30yr low score
#> 96   No PhD +30yr low score
#> 97      PhD +30yr low score
#> 98     PhD +30yr high score
#> 99       PhD 30yr low score
#> 100     PhD +30yr low score

These values are what are in the diagram.

younggeun
  • 923
  • 1
  • 12
  • 19
0

An easy way to classify the end states of a binary tree is to do something like this:

df2 <- df %>%
      mutate(phd=ifelse(phd=='yes', 100, 0),
             age=ifelse(age<30, 10, 0),
             gre=ifelse(gre<151, 1, 0),
             bucket = phd + age + gre
      )  %>%
      arrange(bucket)

The bucket values give you the outputs of every possible state.

SteveM
  • 2,226
  • 3
  • 12
  • 16