3

Consider the following example:

library(dplyr)

# sample data 
set.seed(1)
mydf <- data.frame(value = as.logical(sample(0:1, 15, replace = TRUE)), group = rep(letters[1:3],each = 5), index = 1:5)

# finds either index of first "TRUE" value by group, or the last value. 
# works with base::ifelse
mydf %>% group_by(group) %>% mutate(max_value = ifelse(all(!value), max(index), index[min(which(value))]))
#> # A tibble: 15 x 4
#> # Groups:   group [3]
#>    value group index   max_value
#>    <lgl> <fct> <int>      <int>
#>  1 FALSE a         1          2
#>  2 TRUE  a         2          2
#>  3 FALSE a         3          2
#>  4 FALSE a         4          2
#>  5 TRUE  a         5          2
#>  6 FALSE b         1          4
#>  7 FALSE b         2          4
#>  8 FALSE b         3          4
#>  9 TRUE  b         4          4
#> 10 TRUE  b         5          4
#> 11 FALSE c         1          5
#> 12 FALSE c         2          5
#> 13 FALSE c         3          5
#> 14 FALSE c         4          5
#> 15 FALSE c         5          5

# the same gives a warning with dplyr::if_else
mydf %>% group_by(group) %>% mutate(max_value = if_else(all(!value), max(index), index[min(which(value))]))

#> Warning in min(which(value)): no non-missing arguments to min; returning Inf

#> # A tibble: 15 x 4
#> # Groups:   group [3]
#>    value group index  max_value
#>    <lgl> <fct> <int>      <int>
#>  1 FALSE a         1          2
#>  2 TRUE  a         2          2
#>  3 FALSE a         3          2
#>  4 FALSE a         4          2
#>  5 TRUE  a         5          2
#>  6 FALSE b         1          4
#>  7 FALSE b         2          4
#>  8 FALSE b         3          4
#>  9 TRUE  b         4          4
#> 10 TRUE  b         5          4
#> 11 FALSE c         1          5
#> 12 FALSE c         2          5
#> 13 FALSE c         3          5
#> 14 FALSE c         4          5
#> 15 FALSE c         5          5

As commented in the code - dplyr::if_else does result in the warning

Warning in min(which(value)): no non-missing arguments to min; returning Inf

Removing the "all FALSE" group c - no warning any more:

mydf_allTRUE <- mydf
mydf_allTRUE[14, 'value'] <- TRUE

mydf_allTRUE %>% group_by(group) %>% mutate(max_value = if_else(all(!value), max(index), index[min(which(value))]))
#> # A tibble: 15 x 4
#> # Groups:   group [3]
#>    value group index max_value
#>    <lgl> <fct> <int>     <int>
#>  1 FALSE a         1         2
#>  2 TRUE  a         2         2
#>  3 FALSE a         3         2
#>  4 FALSE a         4         2
#>  5 TRUE  a         5         2
#>  6 FALSE b         1         4
#>  7 FALSE b         2         4
#>  8 FALSE b         3         4
#>  9 TRUE  b         4         4
#> 10 TRUE  b         5         4
#> 11 FALSE c         1         4
#> 12 FALSE c         2         4
#> 13 FALSE c         3         4
#> 14 TRUE  c         4         4
#> 15 FALSE c         5         4

Created on 2019-12-22 by the reprex package (v0.3.0)

What confuses me, is that (I believe that) I constructed the TRUE part in a way that the FALSE part (index[min(which(value))]) must contain a value. Why does this then give a warning? It is problematic, because I have data with several thousand groups and most of them are in the "FALSE" bit and the warnings make the computation extremely slow.

I am happy to use base::ifelse, but I just wondered how dplyr::if_else is evaluating both TRUE and FALSE sides, is this somehow at the same time?

tjebo
  • 21,977
  • 7
  • 58
  • 94

1 Answers1

1

The issue is because we are checking cases where there are groups that return NULL withwhich(value)`

min(NULL)
#[1] Inf

Warning message: In min(NULL) : no non-missing arguments to min; returning Inf


An option is to subject the which output by indexing with [1] to return NA

mydf %>%
   group_by(group) %>%
   mutate(max_value = if_else(all(!value), max(index), index[which(value)[1]]))
# A tibble: 15 x 4
# Groups:   group [3]
#   value group index max_value
#   <lgl> <fct> <int>     <int>
# 1 FALSE a         1         2
# 2 TRUE  a         2         2
# 3 FALSE a         3         2
# 4 FALSE a         4         2
# 5 TRUE  a         5         2
# 6 FALSE b         1         4
# 7 FALSE b         2         4
# 8 FALSE b         3         4
# 9 TRUE  b         4         4
#10 TRUE  b         5         4
#11 FALSE c         1         5
#12 FALSE c         2         5
#13 FALSE c         3         5
#14 FALSE c         4         5
#15 FALSE c         5         5

Also, in this case, as we are returning a single element, if/else would be more appropriate

mydf %>%
    group_by(group) %>%
    mutate(max_value = if(all(!value)) max(index) else index[which(value)[1]])
# A tibble: 15 x 4
# Groups:   group [3]
#   value group index max_value
#   <lgl> <fct> <int>     <int>
# 1 FALSE a         1         2
# 2 TRUE  a         2         2
# 3 FALSE a         3         2
# 4 FALSE a         4         2
# 5 TRUE  a         5         2
# 6 FALSE b         1         4
# 7 FALSE b         2         4
# 8 FALSE b         3         4
# 9 TRUE  b         4         4
#10 TRUE  b         5         4
#11 FALSE c         1         5
#12 FALSE c         2         5
#13 FALSE c         3         5
#14 FALSE c         4         5
#15 FALSE c         5         5
akrun
  • 874,273
  • 37
  • 540
  • 662
  • so, you are saying that `dplyr::if_else` indeed seems to test both TRUE and FALSE at the same time? – tjebo Dec 22 '19 at 16:19
  • 1
    Thanks for the indexing idea, that's very clever – tjebo Dec 22 '19 at 16:19
  • @Tjebo `if_else` comes with additional checks like types etc. In your case, there is no need for `ifelse/if_else` as it is returning a single output – akrun Dec 22 '19 at 16:31
  • 1
    This is indeed very true. It actually is very revealing to me, as I had until now not quite grasped this essential difference between ifelse/ if_else and if/else. Makes a lot of sense. Thanks – tjebo Dec 22 '19 at 16:33