R dplyr::mutate with ifelse conditioned on a global variable recycles result from first row

Question

I am curious why an ifelse() statement within a call to dplyr::mutate() only seems to apply to the first row of my data frame. This returns a single value, which is recycled down the entire column. Since the expressions evaluated in either case of the ifelse() are only valid in the context of my data frame, I would expect the condition check and resulting expression evaluations to be performed on the columns as a whole, not just their first elements.

Here's an example: I have a variable defined outside the data frame called checkVar. Depending on the value of checkVar, I want to add differnt values to my data frame in a new column, z, that are computed as a function of existing columns.

If I do

checkVar <- 1
df <- data.frame( x=11:15, y=1:5 ) %>%
  dplyr::mutate( z=ifelse(checkVar == 1, x/y, x-y) )
df

it returns

Instead of z being the quotient of x and y for each row, all rows are populated with the quotient of x and y from the first row of the data frame.

However, if I specify rowwise(), I get the result I want:

df <- df %>%
  dplyr::rowwise() %>%
  dplyr::mutate( z=ifelse(checkVar == 1, x/y, x-y) ) %>%
  dplyr::ungroup()
df

returns

# A tibble: 5 x 3
      x     y         z
  <int> <int>     <dbl>
1    11     1 11.000000
2    12     2  6.000000
3    13     3  4.333333
4    14     4  3.500000
5    15     5  3.000000

Why do I have to explicitly specify rowwise() when x and y are only defined as columns of my data frame?

`checkVar` is of `length` 1. This, I believe, leads to only the first row of `x` and `y` being used. If you set `checkVar <- rep(1,5)`, you get your desired output. If you used `dplyr`'s `if_else`, it would tell you what the issue is. Also, using `rowwise` makes it so that everything inside the `ifelse` is of length 1. — Abdou, Oct 06 '17 at 21:29
If you use dplyr version of `ifelse`, which is `if_else`, then you got error `"true is length 5 not 1 or 1."`. — Marek, Oct 06 '17 at 21:40

Psidom · Accepted Answer · 2017-10-06T21:34:52.013

7

This is not really related to dplyr::mutate but to how ifelse works, here is the docs ?ifelse:

ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE.

Usage

ifelse(test, yes, no)

And example:

ifelse(T, c(1,2,3), c(2,3,4))
# [1] 1

Your first case is vectorized, ifelse takes vectors x/y and x-y as yes and no parameters, since checkVar == 1 returns TRUE (scalar), ifelse returns (x/y)[1], i.e. the first element of vector x/y, which is 11 and get recycled to fill the new column z;

In your second case, mutate and ifelse is executed per row, so it's evaluated five times, and each time returns the value of x/y for that row.

If your condition is scalar, then you don't need vectorized ifelse, if/else is more suitable to use:

checkVar <- 1
mutate(df, z = if(checkVar == 1) x/y else x-y)

#   x y         z
#1 11 1 11.000000
#2 12 2  6.000000
#3 13 3  4.333333
#4 14 4  3.500000
#5 15 5  3.000000

edited Oct 06 '17 at 21:34

answered Oct 06 '17 at 21:29

Psidom

209,562
33
339
356

1

Your last example is very interesting; I didn't even know you could use the standard `if/else` construct on the right-hand side of an assignment within a call to `mutate`. Thanks for posting; this is very good to know! – bmosov01 Oct 06 '17 at 21:42
1

Alternative: `mutate(z = case_when(checkVar==1 ~ x/y, TRUE ~ as.numeric(x-y)))` – Marek Oct 06 '17 at 21:43
@bmosov01 Glad it helps! – Psidom Oct 06 '17 at 21:44
@Marek `case_when` will still errors out here. The longest LHS in the formulas should have the same length of the longest RHS in `case_when` to work correctly. – Psidom Oct 06 '17 at 21:50
@Psidom Strange, works for me dplyr-0.7.3 on R-3.4.2 – Marek Oct 06 '17 at 22:03
1

@Marek You're right. It works after upgraded `dplyr`. – Psidom Oct 06 '17 at 22:08

R dplyr::mutate with ifelse conditioned on a global variable recycles result from first row

1 Answers1

Linked