Consider the following example:
library(dplyr)
# sample data
set.seed(1)
mydf <- data.frame(value = as.logical(sample(0:1, 15, replace = TRUE)), group = rep(letters[1:3],each = 5), index = 1:5)
# finds either index of first "TRUE" value by group, or the last value.
# works with base::ifelse
mydf %>% group_by(group) %>% mutate(max_value = ifelse(all(!value), max(index), index[min(which(value))]))
#> # A tibble: 15 x 4
#> # Groups: group [3]
#> value group index max_value
#> <lgl> <fct> <int> <int>
#> 1 FALSE a 1 2
#> 2 TRUE a 2 2
#> 3 FALSE a 3 2
#> 4 FALSE a 4 2
#> 5 TRUE a 5 2
#> 6 FALSE b 1 4
#> 7 FALSE b 2 4
#> 8 FALSE b 3 4
#> 9 TRUE b 4 4
#> 10 TRUE b 5 4
#> 11 FALSE c 1 5
#> 12 FALSE c 2 5
#> 13 FALSE c 3 5
#> 14 FALSE c 4 5
#> 15 FALSE c 5 5
# the same gives a warning with dplyr::if_else
mydf %>% group_by(group) %>% mutate(max_value = if_else(all(!value), max(index), index[min(which(value))]))
#> Warning in min(which(value)): no non-missing arguments to min; returning Inf
#> # A tibble: 15 x 4
#> # Groups: group [3]
#> value group index max_value
#> <lgl> <fct> <int> <int>
#> 1 FALSE a 1 2
#> 2 TRUE a 2 2
#> 3 FALSE a 3 2
#> 4 FALSE a 4 2
#> 5 TRUE a 5 2
#> 6 FALSE b 1 4
#> 7 FALSE b 2 4
#> 8 FALSE b 3 4
#> 9 TRUE b 4 4
#> 10 TRUE b 5 4
#> 11 FALSE c 1 5
#> 12 FALSE c 2 5
#> 13 FALSE c 3 5
#> 14 FALSE c 4 5
#> 15 FALSE c 5 5
As commented in the code - dplyr::if_else
does result in the warning
Warning in min(which(value)): no non-missing arguments to min; returning Inf
Removing the "all FALSE" group c - no warning any more:
mydf_allTRUE <- mydf
mydf_allTRUE[14, 'value'] <- TRUE
mydf_allTRUE %>% group_by(group) %>% mutate(max_value = if_else(all(!value), max(index), index[min(which(value))]))
#> # A tibble: 15 x 4
#> # Groups: group [3]
#> value group index max_value
#> <lgl> <fct> <int> <int>
#> 1 FALSE a 1 2
#> 2 TRUE a 2 2
#> 3 FALSE a 3 2
#> 4 FALSE a 4 2
#> 5 TRUE a 5 2
#> 6 FALSE b 1 4
#> 7 FALSE b 2 4
#> 8 FALSE b 3 4
#> 9 TRUE b 4 4
#> 10 TRUE b 5 4
#> 11 FALSE c 1 4
#> 12 FALSE c 2 4
#> 13 FALSE c 3 4
#> 14 TRUE c 4 4
#> 15 FALSE c 5 4
Created on 2019-12-22 by the reprex package (v0.3.0)
What confuses me, is that (I believe that) I constructed the TRUE
part in a way that the FALSE
part (index[min(which(value))]
) must contain a value. Why does this then give a warning?
It is problematic, because I have data with several thousand groups and most of them are in the "FALSE" bit and the warnings make the computation extremely slow.
I am happy to use base::ifelse
, but I just wondered how dplyr::if_else
is evaluating both TRUE and FALSE sides, is this somehow at the same time?