0

I have difficulty to set proper nested if statement in a user-defined function.

My sample data is like this

test <- data.frame(x=rev(0:10),y=10:20)

if_state <- function(x,y) {
  if (x==min(x) && y==max(y)) {
    "good"
  } else if (max(x)/2==y[which(y==15)]/3) {  # to find when x=5 and y=5 condition if it is true set class to "y==5"
    "y==5"
  }
    NA
}

   > test
    x  y
1  10 10
2   9 11
3   8 12
4   7 13
5   6 14
6   5 15
7   4 16
8   3 17
9   2 18
10  1 19
11  0 20

library(dplyr)
test %>%
  mutate(class = if_state(x,y))

    x  y class
1  10 10    NA
2   9 11    NA
3   8 12    NA
4   7 13    NA
5   6 14    NA
6   5 15    NA
7   4 16    NA
8   3 17    NA
9   2 18    NA
10  1 19    NA
11  0 20    NA

I don't know why the if statement is not working correctly? The question is what is the base R function that work same as dplyr's case_when ? please see the comments below.

So the expected output

    x  y class
1  10 10    NA
2   9 11    NA
3   8 12    NA
4   7 13    NA
5   6 14    NA
6   5 15    y==5
7   4 16    NA
8   3 17    NA
9   2 18    NA
10  1 19    NA
11  0 20    good
Alexander
  • 4,527
  • 5
  • 51
  • 98

1 Answers1

3

R functions return the last value evaluated evaluated during their invocation, even without an explicit call to return (see this answer for more detail); so, where NA is the last value evaluated in your if_state function (as it's outside the if-else if control flow, and so will always be evaluated), it will always return NA, even when the if and else if conditions are true. For your function to work as you expect, you need to move NA into an else statement:

if_state <- function(x,y) {
  if (x == min(x) && y == max(y)) {
    "good"
  } else if (max(x)/2 == y[which(y == 15)]/3) {
    "y==5"
  } else {
    NA 
  }
}

Note that when using dplyr, testing for multiple conditions to determine a return value is often more succinctly accomplished with case_when:

test %>% mutate(class = case_when(
  x == min(x) && y == max(y) ~ "good",
  max(x)/2 == y[which(y == 15)]/3 ~ "y == 5",
  TRUE ~ NA_character_
))

Edit: based on OP's clarification and eipi10's help, here is the final function:

if_state = function(x, y) {
  case_when(x == min(x) && y == max(y) ~ "good", 
            x == max(x)/2 & y/3 == 5 ~ "y==5", 
            TRUE ~ NA_character_)
}
cmaher
  • 5,100
  • 1
  • 22
  • 34
  • 1
    It looks like `max(x)/2 == y[which(y == 15)]/3` is always TRUE, so the result will be `"y==5"` for any rows that don't satisfy the first condition. Maybe the OP actually wanted something like `x==max(x)/2 & y/3==5 ~ "y==5"`? – eipi10 Apr 30 '18 at 18:51
  • @cmaher Thank you for explicit answer. wheh I run the your new `if_state` I am getting all class values to be `y==5` ?? – Alexander Apr 30 '18 at 18:51
  • See my comment above. – eipi10 Apr 30 '18 at 18:52
  • @cmaher, How could `max(x)/2 == y[which(y == 15)]/3` be true always. There is only one row that satisfies this right ? – Alexander Apr 30 '18 at 18:54
  • @Alexander I had copied the conditions from the question (the `if-else` statement also always returns `"y == 5`` for the test data). Would you mind explaining your expected output? – cmaher Apr 30 '18 at 18:56
  • @Alexander the condition in eipi10's comment appears to give your intended result? (e.g. if you change the `case_when` to `mutate(class=case_when(x == min(x) && y == max(y) ~ "good", x == max(x)/2 & y/3 == 5 ~ "y==5", TRUE ~ NA_character_))` – cmaher Apr 30 '18 at 19:00
  • 1
    `max(x)/2` returns 5 for every row in the data frame. `y[which(y == 15)]/3` returns 5 for every row in the data frame. So the condition being evaluated is `5==5` which is always TRUE. – eipi10 Apr 30 '18 at 19:00
  • @cmaher sure. The function should catch the multiple conditions at once. So when `max(x)/2` and the `y` value divided by `3` is equal to `5`, set class to `y==5`. That is it! – Alexander Apr 30 '18 at 19:00
  • @cmaher the question is i cant use case_when because I have too many conditions that should be defined in `function`. – Alexander Apr 30 '18 at 19:02
  • @cmaher I thought your modified `if_state` should be solving the issue without `case_when` ? – Alexander Apr 30 '18 at 19:04
  • You can put the `case_when` statement inside the `if_state` function, just as you would for the `if` statements. Then you would do `mutate(class = if_state(x,y))`, just as in your original code. – eipi10 Apr 30 '18 at 19:05
  • @eipi10 actually my original question than would be what is the r base function that replicates `case_when` ? maybe `with` ?? – Alexander Apr 30 '18 at 19:10
  • You could do nested `ifelse` statements or you could use subsetting, but why not just use `case_when`? – eipi10 Apr 30 '18 at 19:11
  • @eipi10 I can use that but I am still not familiar with it. Could you send a solution when `case_when` used in function ? – Alexander Apr 30 '18 at 19:13
  • 2
    Wrap @cmaher's answer in a function call (and I've changed the second condition as well): `if_state = function(x,y) {case_when( x == min(x) && y == max(y) ~ "good", x == max(x)/2 & y/3 == 5 ~ "y==5", TRUE ~ NA_character_ )}` – eipi10 Apr 30 '18 at 19:14