I have a nested if_else
statement inside mutate
. In my example data frame:
tmp_df2 <- data.frame(a = c(1,1,2), b = c(T,F,T), c = c(1,2,3))
a b c
1 1 TRUE 1
2 1 FALSE 2
3 2 TRUE 3
I wish to group by a
and then perform operations based on whether a group has one or two rows. I would have thought this nested if_else
would suffice:
tmp_df2 %>%
group_by(a) %>%
mutate(tmp_check = n() == 1) %>%
mutate(d = if_else(tmp_check, # check for number of entries in group
0,
if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
)
)
But this throws the error:
Error in eval(substitute(expr), envir, enclos) :
`false` is length 2 not 1 or 1.
The way the example is set up, when the first if_else(n() == 1)
condition evaluates to true, then one element is returned, but when it evaluates to false, then a vector with two elements is returned, which is what I am assuming is causing the error. Yet, logically this statement seems sound to me.
The following two statements produce (desired) results:
> tmp_df2 %>%
+ group_by(a) %>%
+ mutate(d = ifelse(rep(n() == 1, n()), # avoid undesired recycling
+ 0,
+ if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
+ )
+ )
Source: local data frame [3 x 4]
Groups: a [2]
a b c d
<dbl> <lgl> <dbl> <dbl>
1 1 TRUE 1 3.0
2 1 FALSE 2 1.5
3 2 TRUE 3 0.0
or just filtering so that only groups containing two rows are left:
> tmp_df2 %>%
+ group_by(a) %>%
+ filter(n() == 2) %>%
+ mutate(d = if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)]))
Source: local data frame [2 x 4]
Groups: a [1]
a b c d
<dbl> <lgl> <dbl> <dbl>
1 1 TRUE 1 3.0
2 1 FALSE 2 1.5
I have three questions.
How does dplyr know that the second output that should not have been evaluated, due to the logical condition, is invalid?
How do I get the desired behaviour in dplyr (without using
ifelse
)?
EDIT as noted in an answer, either do not have the temporary tmp_check
column and use the if ... else
construct, or use the following code that works, but produces warnings:
library(dplyr)
tmp_df2 %>%
group_by(a) %>%
mutate(tmp_check = n() == 1) %>%
mutate(d = if (tmp_check) # check for number of entries in group
0 else
if_else(b, sum(c)/c[b == T], sum(c)/c[which(b != T)])
)