26

I can't use switch inside of mutate because it returns the whole vector instead of just the row. As a hack, I'm using:

pick <- function(x, v1, v2, v3, v4) {
    ifelse(x == 1, v1,
           ifelse(x == 2, v2,
                  ifelse(x == 3, v3,
                         ifelse(x == 4, v4, NA))))
}

This works inside of mutate, and is fine for now because I'm typically choosing among 4 things, but that may change. Can you recommend an alternative?

For example:

library(dplyr)
df.faithful <- tbl_df(faithful)
df.faithful$x  <- sample(1:4, 272, rep=TRUE)
df.faithful$y1 <- rnorm(n=272, mean=7, sd=2)
df.faithful$y2 <- rnorm(n=272, mean=5, sd=2)
df.faithful$y3 <- rnorm(n=272, mean=7, sd=1)
df.faithful$y4 <- rnorm(n=272, mean=5, sd=1)

Using pick:

mutate(df.faithful, y = pick(x, y1, y2, y3, y4))
Source: local data frame [272 x 8]

   eruptions waiting x        y1        y2       y3       y4        y
1      3.600      79 1  8.439092 5.7753006 8.319372 5.078558 8.439092
2      1.800      54 2 13.515956 6.1971512 6.343157 4.962349 6.197151
3      3.333      74 4  7.693941 6.8973365 5.406684 5.425404 5.425404
4      2.283      62 4 12.595852 6.9953995 7.864423 3.730967 3.730967
5      4.533      85 3 11.952922 5.1512987 9.177687 5.511899 9.177687
6      2.883      55 3  7.881350 1.0289711 6.304004 3.554056 6.304004
7      4.700      88 4  8.636709 6.3046198 6.788619 5.748269 5.748269
8      3.600      85 1  8.027371 6.3535056 7.152698 7.034976 8.027371
9      1.950      51 1  5.863370 0.1707758 5.750440 5.058107 5.863370
10     4.350      85 1  7.761653 6.2176610 8.348378 1.861112 7.761653
..       ...     ... .       ...       ...      ...      ...      ...

We see that I copy the value from y1 into y if x == 1, and so on. This is what I'm looking to do, but want to be able to do it, whether I have a list of 4 or 400 columns.

Trying to use switch:

mutate(df.faithful, y = switch(x, y1, y2, y3, 4))

Error in switch(c(1L, 2L, 4L, 4L, 3L, 3L, 4L, 1L, 1L, 1L, 4L, 3L, 1L,  : 
EXPR must be a length 1 vector

Trying to use list:

mutate(df.faithful, y = list(y1, y2, y3, y4)[[x]])
Error in list(c(8.43909205142925, 13.5159559591257, 7.69394050059568,  : 
recursive indexing failed at level 2

Trying to use c:

mutate(df.faithful, y = c(y1, y2, y3, y4)[x])
Source: local data frame [272 x 8]

   eruptions waiting x        y1        y2       y3       y4         y
1      3.600      79 1  8.439092 5.7753006 8.319372 5.078558  8.439092
2      1.800      54 2 13.515956 6.1971512 6.343157 4.962349 13.515956
3      3.333      74 4  7.693941 6.8973365 5.406684 5.425404 12.595852
4      2.283      62 4 12.595852 6.9953995 7.864423 3.730967 12.595852
5      4.533      85 3 11.952922 5.1512987 9.177687 5.511899  7.693941
6      2.883      55 3  7.881350 1.0289711 6.304004 3.554056  7.693941
7      4.700      88 4  8.636709 6.3046198 6.788619 5.748269 12.595852
8      3.600      85 1  8.027371 6.3535056 7.152698 7.034976  8.439092
9      1.950      51 1  5.863370 0.1707758 5.750440 5.058107  8.439092
10     4.350      85 1  7.761653 6.2176610 8.348378 1.861112  8.439092
..       ...     ... .       ...       ...      ...      ...       ...

No errors are produced, but the behavior is not as intended.

wdkrnls
  • 4,548
  • 7
  • 36
  • 64

9 Answers9

37

Eons too late for the OP, but in case this shows up in a search ...

dplyr v0.5 has recode(), a vectorized version of switch(), so

data_frame(
  x = sample(1:4, 10, replace=TRUE),
  y1 = rnorm(n=10, mean=7, sd=2),
  y2 = rnorm(n=10, mean=5, sd=2),
  y3 = rnorm(n=10, mean=7, sd=1),
  y4 = rnorm(n=10, mean=5, sd=1)
) %>%
mutate(y = recode(x,y1,y2,y3,y4))

produces, as anticipated:

# A tibble: 10 x 6
       x        y1       y2       y3       y4        y
   <int>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1      2  6.950106 6.986780 7.826778 6.317968 6.986780
2      1  5.776381 7.706869 7.982543 5.048649 5.776381
3      2  7.315477 2.213855 6.079149 6.070598 2.213855
4      3  7.461220 5.100436 7.085912 4.440829 7.085912
5      3  5.780493 4.562824 8.311047 5.612913 8.311047
6      3  5.373197 7.657016 7.049352 4.470906 7.049352
7      2  6.604175 9.905151 8.359549 6.430572 9.905151
8      3 11.363914 4.721148 7.670825 5.317243 7.670825
9      3 10.123626 7.140874 6.718351 5.508875 6.718351
10     4  5.407502 4.650987 5.845482 4.797659 4.797659

(Also works with named args, including character and factor x's.)

user6702291
  • 386
  • 3
  • 2
6

Do the operation by each value of x. This is the data.table version, I assume smth similar can be done in dplyr:

library(data.table)

dt = data.table(x = c(1,1,2,2), a = 1:4, b = 4:7)

dt[, newcol := switch(as.character(x), '1' = a, '2' = b, NA), by = x]
dt
#   x a b newcol
#1: 1 1 4      1
#2: 1 2 5      2
#3: 2 3 6      6
#4: 2 4 7      7
eddi
  • 49,088
  • 6
  • 104
  • 155
6

You can now use dplyr's function case_when with mutate().

To follow your example in generating the data:

library(dplyr)

df.faithful <- tbl_df(faithful)
df.faithful$x  <- sample(1:4, 272, rep=TRUE)
df.faithful$y1 <- rnorm(n=272, mean=7, sd=2)
df.faithful$y2 <- rnorm(n=272, mean=5, sd=2)
df.faithful$y3 <- rnorm(n=272, mean=7, sd=1)
df.faithful$y4 <- rnorm(n=272, mean=5, sd=1)

Now we define a new pick() function using case_when:

pick2 <- function(x, v1, v2, v3, v4) {
  out = case_when(
    x == 1 ~ v1,
    x == 2 ~ v2,
    x == 3 ~ v3,
    x == 4 ~ v4
  )
  return(out)
}

And you see you can perfectly use it within mutate():

df.faithful %>% 
  mutate(y = pick2(x, y1, y2, y3, y4))

And the output is:

# A tibble: 272 x 8
   eruptions waiting     x    y1    y2    y3    y4     y
       <dbl>   <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
 1      3.6       79     3  8.73  7.23  8.89  4.04  8.89
 2      1.8       54     3  9.97  4.31  7.06  5.05  7.06
 3      3.33      74     1  6.65  7.23  4.46  6.49  6.65
 4      2.28      62     1  6.40  4.39  5.41  3.49  6.40
 5      4.53      85     4  3.96  8.85  7.43  6.51  6.51
 6      2.88      55     4  6.36  8.08  5.82  5.06  5.06
 7      4.7       88     1  5.91  6.47  6.43  5.88  5.91
 8      3.6       85     1  7.77  4.55  6.56  5.05  7.77
 9      1.95      51     4  5.74  6.46  6.95  4.26  4.26
10      4.35      85     1  7.04  1.73  5.71  2.53  7.04
# ... with 262 more rows
Mario Becerra
  • 514
  • 1
  • 6
  • 16
  • 1
    Just in case it's not clear to others, one can also use `case_when` directly within mutate. The `pick2` function is not needed but of course it might be desired (if used repeatedly for example). – Robert McDonald Nov 25 '19 at 23:41
4

If you want to use switch in mutate you must execute rowwise before

iris %>%
  rowwise() %>%
  mutate(
    x = switch(
      as.character(Species),
      'setosa' = 'ss',
      'versicolor' = 'vc',
      'virginica' = 'vg'
    )
  ) %>%
  ungroup()
Ali
  • 51
  • 5
  • 1
    Thanks, it actually worked. But... why? (OK, the documentation can hint a clue: https://dplyr.tidyverse.org/articles/rowwise.html). It seems that `mutate` will operate normally by columns, but `switch` expects single values, so rows need to be fed one by one. – Óscar Gómez Alcañiz Aug 12 '21 at 08:14
2

You can modify your function along this way:

map <- data.frame(i=1:2,v=10:11)
#   i  v
# 1 1 10
# 2 2 11

set.seed(1)
x <- sample(1:3,10,rep=T)
#  [1] 1 2 2 3 1 3 3 2 2 1

i <- match(x,map$i)
ifelse(is.na(i),x,map$v[i])
# [1] 10 11 11  3 10  3  3 11 11 10

The idea is to keep the values you're looking for and the replacement values in a separate data frame map, and then use match to match x and map.

[Update]

You can wrap-up this solution into a function that can be used within mutate:

multipleReplace <- function(x, what, by) {
  stopifnot(length(what)==length(by))               
  ind <- match(x, what)
  ifelse(is.na(ind),x,by[ind])
}

# Create a sample data set
d <- structure(list(x = c(1L, 2L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 1L), y = c(1L, 2L, 2L, 3L, 3L, 1L, 3L, 2L, 2L, 1L)), .Names = c("x", "y"), row.names = c(NA, -10L), class = "data.frame")

d %>% 
  mutate(z = multipleReplace(x, what=c(1,3), by=c(101,103)))
#    x y   z
# 1  1 1 101
# 2  2 2   2
# 3  2 2   2
# 4  3 3 103
# 5  1 3 101
# 6  3 1 103
# 7  3 3 103
# 8  2 2   2
# 9  2 2   2
# 10 1 1 101
Marat Talipov
  • 13,064
  • 5
  • 34
  • 53
2

Here's another way using data.table. The idea is to basically create a key data.table with the combinations and then perform a join, as follows:

I'll use the data.table from @eddi's answer.

require(data.table)
key = data.table(x = 1:2, col = c("a", "b"))

setkey(dt, x)
dt[key, new_col := get(i.col), by=.EACHI]
#    x a b new_col
# 1: 1 1 4       1
# 2: 1 2 5       2
# 3: 2 3 6       6
# 4: 2 4 7       7

The join is performed on the column x. For each row of key, the corresponding matching rows in dt are found. For ex: x = 1 from key matches with rows 1 and 2 of dt. And on those rows, we access the column that's stored in key's col, which is "a". get("a") returns the values of column a for those matching rows, which is 1 and 2. Hope this helps.

by=.EACHI ensures that the expression new_col := get(i.col) is evaluated for each row in key. You can learn more about it here.

Community
  • 1
  • 1
Arun
  • 116,683
  • 26
  • 284
  • 387
  • 1
    The join method seems best to me (+1) - but it's surprising to see this presented as a `data.table` specific answer. Could be done with `dplyr::left_join` or simply `match` or `merge`. – Gregor Thomas Aug 10 '16 at 21:30
  • I really don't understand the point of your comment under my answer. Do you mean that I must add all possible solutions to my answer? Besides the solutions you propose would create a completely new data.frame while this updates the original data.table by reference :-O. – Arun Aug 10 '16 at 23:56
  • The point of the comment is to hope that anyone reading this answer realizes that, in addition to working (as you nicely demonstrate) in `data.table`, the same general method is can work in base and `dplyr`. I think a join is the most natural way of solving OPs problem, but as OP tagged `dplyr` and requested a `dplyr` solution, I thought it strange that the only answer that used the join method *didn't* use `dplyr`. I'd also rather leave a comment than add a new answer-this question has too many already, and using the same method with a different package's syntax doesn't seem different enough. – Gregor Thomas Aug 11 '16 at 00:03
  • I certainly don't mean that you must add all possible solutions to your answer - nor do I want a debate about the relative merits of `dplyr` and `data.table`. I wouldn't have said anything at all if the question itself wasn't so focused on `dplyr`. If your amenable, I would make a friendly edit to add a demonstration of the same technique with `dplyr` syntax. – Gregor Thomas Aug 11 '16 at 00:12
  • I wasn't *debating*. I've already provided an answer using *join+update* because I find it best suits the Q and consider it fundamentally different from what you suggest (join everything first on to a new object and update). Feel free to add your own to the list. – Arun Aug 11 '16 at 04:28
1

An alternate (more involved) route involves using tidyr:

df %>%
  mutate(row = row_number()) %>%
  gather(n, y, y1:y4) %>%
  mutate(n = as.integer(str_extract(n, "[0-9]+"))) %>%
  filter(x == n) %>%
  arrange(row) %>%
  select(-c(row, n))
wdkrnls
  • 4,548
  • 7
  • 36
  • 64
1

I am a bit late but here is my solution using mapply.

vswitch <- function(x, ...) {
  mapply(FUN = function(x, ...) {
           switch(x, ...)
         }, x, ...)
}

mutate(df.faithful, y = vswitch(x, y1, y2, y3, y4))
Kushdesh
  • 1,118
  • 10
  • 16
1

A more complicated version than the solution suggested by user6702291 would be to use a map function, like map_dbl(). It is more complicated but I thought it worth sharing, as it is more generalisable to other situations where there's no vectorised version to the function you're trying to use, yet.

In this case it would work like this.

 tibble.faithful %>% 
  mutate(y = map_dbl(seq_along(x), ~switch(x[.x], y1, y2, y3, y4)[1]))

I'm actually not to sure, why the "[1]" is needed - but I wanted to share it anyway as a suggestion.

Torakoro
  • 162
  • 7