65

I'd like to use dplyr's mutate_at function to apply a function to several columns in a dataframe, where the function inputs the column to which it is directly applied as well as another column in the dataframe.

As a concrete example, I'd look to mutate the following dataframe

# Example input dataframe
df <- data.frame(
    x = c(TRUE, TRUE, FALSE),
    y = c("Hello", "Hola", "Ciao"),
    z = c("World", "ao", "HaOlam")
)

with a mutate_at call that looks similar to this

df %>%
mutate_at(.vars = vars(y, z),
          .funs = ifelse(x, ., NA))

to return a dataframe that looks something like this

# Desired output dataframe
df2 <- data.frame(x = c(TRUE, TRUE, FALSE),
                  y_1 = c("Hello", "Hola", NA),
                  z_1 = c("World", "ao", NA))

The desired mutate_at call would be similar to the following call to mutate:

df %>%
   mutate(y_1 = ifelse(x, y, NA),
          z_1 = ifelse(x, z, NA))

I know that this can be done in base R in several ways, but I would specifically like to accomplish this goal using dplyr's mutate_at function for the sake of readability, interfacing with databases, etc.

Below are some similar questions asked on stackoverflow which do not address the question I posed here:

adding multiple columns in a dplyr mutate call

dplyr::mutate to add multiple values

Use of column inside sum() function using dplyr's mutate() function

bschneidr
  • 6,014
  • 1
  • 37
  • 52
  • 18
    `df %>% mutate_at(vars(y, z), funs(ifelse(x, ., NA)))` – eipi10 Aug 29 '16 at 15:35
  • @eipi10 Ah, ok. So the above code would've worked if I had actually wrapped `ifelse(x, ., NA)` in a call to `funs()`. Thank you! I've checked your solution and that works perfectly. Your solution is exactly what I was looking for! – bschneidr Aug 29 '16 at 15:43

2 Answers2

72

This was answered by @eipi10 in @eipi10's comment on the question, but I'm writing it here for posterity.

The solution here is to use:

df %>%
   mutate_at(.vars = vars(y, z),
             .funs = list(~ ifelse(x, ., NA)))

You can also use the new across() function with mutate(), like so:

df %>%
   mutate(across(c(y, z), ~ ifelse(x, ., NA)))

The use of the formula operator (as in ~ ifelse(...)) here indicates that ifelse(x, ., NA) is an anonymous function that is being defined within the call to mutate_at().

This works similarly to defining the function outside of the call to mutate_at(), like so:

temp_fn <- function(input) ifelse(test = df[["x"]],
                                  yes = input,
                                  no = NA)

df %>%
   mutate_at(.vars = vars(y, z),
             .funs = temp_fn)

Note on syntax changes in dplyr: Prior to dplyr version 0.8.0, you would simply write .funs = funs(ifelse(x, . , NA)), but the funs() function is being deprecated and will soon be removed from dplyr.

bschneidr
  • 6,014
  • 1
  • 37
  • 52
  • "The use of funs() here indicates that ifelse(x, ., NA) is an anonymous function" ---- How does `funs()` differ from the traditional anonymous function, `function(x)`? – coip Sep 12 '18 at 22:30
  • 1
    The most notable thing in my experience is that it requires less typing and is similarly readable. However, it also allows you to provide a list of anonymous functions (e.g. `funs(avg = mean(.), total = sum(., na.rm = TRUE))`. See https://www.rdocumentation.org/packages/dplyr/versions/0.7.6/topics/funs. – bschneidr Sep 14 '18 at 02:45
  • The example with the function defined outside `mutate` would only work if `df` has not changed between when the function is defined and used. This seems like a risky strategy. What if, for example, someone groups the data first? – randy Jul 02 '21 at 23:58
  • Agreed, I wouldn't recommend doing that. That example is given just to help with explaining how the actual solutions work. – bschneidr Jul 09 '21 at 20:44
19

To supplement the previous response, if you wanted mutate_at() to add new variables (instead of replacing), with names such as z_1 and y_1 as in the original question, you just need to:

  • dplyr >=1 with across(): add .names="{.col}_1", or alternatively use list('1'=~ifelse(x, ., NA) (back ticks!)
  • dplyr [0.8, 1[: use list('1'=~ifelse(x, ., NA)
  • dplyr <0.8: use funs('1'=ifelse(x, ., NA)
library(tidyverse)

df <- data.frame(
  x = c(TRUE, TRUE, FALSE),
  y = c("Hello", "Hola", "Ciao"),
  z = c("World", "ao", "HaOlam")
)

## Version >=1
df %>%
  mutate(across(c(y, z), 
                list(~ifelse(x, ., NA)),
                .names="{.col}_1"))
#>       x     y      z   y_1   z_1
#> 1  TRUE Hello  World Hello World
#> 2  TRUE  Hola     ao  Hola    ao
#> 3 FALSE  Ciao HaOlam  <NA>  <NA>


## 0.8 - <1
df %>%
  mutate_at(.vars = vars(y, z),
            .funs = list(`1`=~ifelse(x, ., NA)))
#>       x     y      z   y_1   z_1
#> 1  TRUE Hello  World Hello World
#> 2  TRUE  Hola     ao  Hola    ao
#> 3 FALSE  Ciao HaOlam  <NA>  <NA>

## Before 0.8
df %>%
  mutate_at(.vars = vars(y, z),
            .funs = funs(`1`=ifelse(x, ., NA)))
#> Warning: `funs()` is deprecated as of dplyr 0.8.0.
#> Please use a list of either functions or lambdas: 
#> 
#>   # Simple named list: 
#>   list(mean = mean, median = median)
#> 
#>   # Auto named with `tibble::lst()`: 
#>   tibble::lst(mean, median)
#> 
#>   # Using lambdas
#>   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.
#>       x     y      z   y_1   z_1
#> 1  TRUE Hello  World Hello World
#> 2  TRUE  Hola     ao  Hola    ao
#> 3 FALSE  Ciao HaOlam  <NA>  <NA>

Created on 2020-10-03 by the reprex package (v0.3.0)

For more details and tricks, see: Create new variables with mutate_at while keeping the original ones

Matifou
  • 7,968
  • 3
  • 47
  • 52
  • And how does this work when the function is defined outside the call to `mutate_at`? – randy Jul 03 '21 at 00:01
  • not sure I get your question @randy, there's no difference if the main function is defined inside our outside (note `ifelse` is itself defined outside the call)? – Matifou Jul 04 '21 at 19:31