0

First of all, I know that there are many questions on SO about if/else statements in R, but none of them has been helpful for my specific situation and I've been struggling with this for a while.

I have a dataframe that looks like this:

metricx <- c(5, 4.8, 4.4, 3.6, 3.2, 2.1, 1.9, .5, .3, .1)
df <- as.data.frame(metricx)

I need to create two new variables based on the value of metricx (risk and answer).

I know this works....

df$risk <- ifelse(df$metricx >= 4.5, 'VERY HIGH', 'HIGH')
df$risk <- ifelse(df$metricx < 3.5, 'MEDIUM', df$risk)
df$risk <- ifelse(df$metricx < 2, 'LOW', df$risk)

But obviously not an elegant or efficient way to do it, since I would have to do this several times (my dataset is very large and i have more groups than this). My understanding is that R has to run through every record each time ifelse is called, so a chained option would be better.

I have tried this...

ifelse(df$metricx >= 4.5,
       (df$risk <- 'VERY HIGH' &
        df$answer <- 'Y')
        , 
ifelse(df$metricx >= 3.5,
       (df$risk = 'HIGH' &
        df$answer = 'Y')
        ,
ifelse(df$metricx >= 2,
        (df$risk = 'MEDIUM' &
        df$answer = 'Y')
        ,
ifelse(df$metricx >= .40,
       (df$risk = 'LOW' &
        df$answer = 'Y')
        ,
(df$risk = 'LOW' &
 df$answer = 'N')
)    
) 
)  
)      

And I have tried this...

if (df$metricx >= 4.5){
  df$risk = 'VERY HIGH'
  df$answer = 'Y'
} else if (df$metricx >= 3.5){
  df$risk = 'HIGH'
  df$answer = 'Y'
} else if (df$metricx >= 2){
  df$risk = 'MEDIUM'
  df$answer = 'Y'
} else if (df$metricx >= .40){
  df$risk = 'LOW'
  df$answer = 'Y'
} else {
  df$risk = 'LOW'
  df$answer = 'N'
}

and they both give different errors, neither of which I can understand. I have looke at several different sites attempting to explain, but still cannot figure out how to do this.

My questions: 1. Why are my solutions not working? They appear to follow the syntax I have seen on the R site? 2. What is the correct way to achieve my desired output?

risk <- c('VERY HIGH', 'VERY HIGH', 'HIGH', 'HIGH', 'MEDIUM', 'MEDIUM', 'LOW', 'LOW', 'LOW', 'LOW') 
answer <- c('Y','Y','Y','Y','Y','Y','Y','Y','Y', 'N')

want <- data.frame(metricx, risk, answer)
pyll
  • 1,688
  • 1
  • 26
  • 44
  • 3
    You should probably use `cut` here instead. – lmo May 09 '17 at 13:49
  • The set of `ifelse` statements don't really have proper syntax or usage. Second set won't work because you are using a vectorized condition that can't be used in `if`. – Gopala May 09 '17 at 14:09
  • If you find something to be very complicated, but it is actually a common operation in statistics, there is an extremely high probability that a simple R function exists just for this purpose. You only have to search for it (consider what a statistician would name the operation to find suitable search terms). – Roland May 09 '17 at 14:11
  • Imo...thank you. It looks like cut is a better option. – pyll May 09 '17 at 14:22
  • Gopala...not sure what you are trying to say...but this answer seems to contradict what you are saying regarding the syntax of my ifelse attempt...http://stackoverflow.com/questions/18012222/nested-ifelse-statement-in-r – pyll May 09 '17 at 14:24
  • Roland--Not sure if trying to be condescending or not...but not really helpful in my opinion. – pyll May 09 '17 at 14:26
  • I was actually trying to be helpful. I was reinventing the wheel (inefficiently) all the time until I realized what I explained in my comment. If you find free advice condescending there is nothing I can do other than not giving free advice to you. – Roland May 09 '17 at 17:31
  • Your comment had nothing to do with this specific question. You basically just came here to say "do a better job searching" when I had demonstrated three different solutions I had already found that were not working. I searched so many combinations of "if else" that my autocomplete will be screwed for weeks. For what it's worth, I don't think "cut" is an intuitive search word for what I needed. I think showing up to a question and basically saying, "You should try searching for a solution" is condescending and unhelpful, and I'm sorry you disagree. Have a great day. – pyll May 09 '17 at 17:44
  • Apparently you didn't understand my comment. I didn't say "you should search" but how you could get better search results. A statistician would use a search like "r discretize continuous variable" and find the cut function. Good luck for your future searches. – Roland May 10 '17 at 05:36

2 Answers2

2

I think using dplyr this is what you want, right?

library(dplyr)
df <- df %>% mutate(risk = cut(metricx, c(0, 2, 3.5, 4.5, 6),
                    labels = c("LOW", "MEDIUM", "HIGH", "VERY HIGH"))) %>% 
  mutate(answer = ifelse(metricx < .4, "N", "Y"))
pyll
  • 1,688
  • 1
  • 26
  • 44
Edwin
  • 3,184
  • 1
  • 23
  • 25
  • This works perfectly. Still confused and frustrated by the nested if else syntax that works for every other language, but this is an efficient and elegant solution. I had never heard of the cut function. Thank you. – pyll May 09 '17 at 14:29
  • You did not get the nesting right, it does work that R. At the third position you start the new `ifelse` instead of closing it. Alternatively check out `case_when` from `dplyr`. – Edwin May 09 '17 at 14:36
1

Per definition you'll always have an answer, which is why I left df$answer out. Try:

metricx <- c(5, 4.8, 4.4, 3.6, 3.2, 2.1, 1.9, .5, .3, .1)
df <- as.data.frame(metricx)

myif<-function(x) {
  if (x<2) y="LOW" else 
    if (x<3.5) y="MEDIUM" else
      if (x<4.5) y="HIGH" else y="VERY HIGH"
  return(y)
}
sapply(df$metricx,myif)

# or:

ifelse(df[1]<2,"LOW",
       ifelse(df[1]<3.5,"MEDIUM",
              ifelse(df[1]<4.5,"HIGH","VERY HIGH")))

# or (modified later):

myif<-function(x) {
  if (x<2) y="LOW" else 
    if (x<3.5) y="MEDIUM" else
      if (x<4.5) y="HIGH" else y="VERY HIGH"
      yv<-c(y,if (x<0.4) "N" else "Y" )
      return(yv)
}
sapply(df$metricx,myif)
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39
  • actually there are conditions under which answer = 'N', so this doesn't quite answer the question. Can I incorporate multiple "do" actions, or would i have to do a separate call? – pyll May 09 '17 at 14:28
  • I really would calculate one vector at a time. And as I think about it, the cut-answer from above is probably the most R-like way to do it. – r.user.05apr May 09 '17 at 14:51