17

I'm trying to tighten up a %>% piped workflow where I need to apply the same function to several columns but with one argument changed each time. I feel like purrr's map or invoke functions should help, but I can't wrap my head around it.

My data frame has columns for life expectancy, poverty rate, and median household income. I can pass all these column names to vars in mutate_at, use round as the function to apply to each, and optionally supply a digits argument. But I can't figure out a way, if one exists, to pass different values for digits associated with each column. I'd like life expectancy rounded to 1 digit, poverty rounded to 2, and income rounded to 0.

I can call mutate on each column, but given that I might have more columns all receiving the same function with only an additional argument changed, I'd like something more concise.

library(tidyverse)

df <- tibble::tribble(
        ~name, ~life_expectancy,          ~poverty, ~household_income,
  "New Haven", 78.0580437642378, 0.264221051111753,  42588.7592521085
  )

In my imagination, I could do something like this:

df %>%
  mutate_at(vars(life_expectancy, poverty, household_income), 
            round, digits = c(1, 2, 0))

But get the error

Error in mutate_impl(.data, dots) : Column life_expectancy must be length 1 (the number of rows), not 3

Using mutate_at instead of mutate just to have the same syntax as in my ideal case:

df %>%
  mutate_at(vars(life_expectancy), round, digits = 1) %>%
  mutate_at(vars(poverty), round, digits = 2) %>%
  mutate_at(vars(household_income), round, digits = 0)
#> # A tibble: 1 x 4
#>   name      life_expectancy poverty household_income
#>   <chr>               <dbl>   <dbl>            <dbl>
#> 1 New Haven            78.1    0.26            42589

Mapping over the digits uses each of the digits options for each column, not by position, giving me 3 rows each rounded to a different number of digits.

df %>%
  mutate_at(vars(life_expectancy, poverty, household_income), 
            function(x) map(x, round, digits = c(1, 2, 0))) %>%
  unnest()
#> # A tibble: 3 x 4
#>   name      life_expectancy poverty household_income
#>   <chr>               <dbl>   <dbl>            <dbl>
#> 1 New Haven            78.1    0.3            42589.
#> 2 New Haven            78.1    0.26           42589.
#> 3 New Haven            78      0              42589

Created on 2018-11-13 by the reprex package (v0.2.1)

camille
  • 16,432
  • 18
  • 38
  • 60
  • 1
    In the past when faced with this problem I ended up gathering my columns, grouping them, mutating them, and spreading them back out. See also [How do I sweep specific columns with dplyr?](https://stackoverflow.com/q/28298688/1968) – Konrad Rudolph Nov 13 '18 at 19:38
  • @KonradRudolph thanks, I was thinking about that too, and that's an approach I've used before, but I'm trying to figure out whether a super simple, one-line version is possible – camille Nov 13 '18 at 19:50
  • @Henrik you might be on to something. Using `map2_dfc` could work, but that requires dropping the `name` column and then maybe joining it back on. I'm trying to imagine a `map2_dfc` / `map_at` hybrid – camille Nov 13 '18 at 19:56
  • 1
    Seems like it might be easier when you will be able to pass a list of functions to summarize_at/mutate_at: https://github.com/tidyverse/dplyr/issues/3433. That doesn't seem to work yet. – MrFlick Nov 13 '18 at 20:06
  • `mutate` supports `!!!` so the easiest in my opinion is to recreate the verbose `mutate` call (not `mutate_at`) programmatically through `map2` or (cleaner to me) `imap` – moodymudskipper Nov 14 '18 at 14:21

3 Answers3

13

2 solutions


mutate with !!!

invoke was a good idea but you need it less now that most tidyverse functions support the !!! operator, here's what you can do :

digits <- c(life_expectancy = 1, poverty = 2, household_income = 0)  
df %>% mutate(!!!imap(digits, ~round(..3[[.y]], .x),.))
# # A tibble: 1 x 4
#          name life_expectancy poverty household_income
#         <chr>           <dbl>   <dbl>            <dbl>
#   1 New Haven            78.1    0.26            42589

..3 is the initial data frame, passed to the function as a third argument, through the dot at the end of the call.

Written more explicitly :

df %>% mutate(!!!imap(
  digits, 
  function(digit, name, data) round(data[[name]], digit),
  data = .))

If you need to start from your old interface (though the one I propose will be more flexible), first do:

digits <- setNames(c(1, 2, 0), c("life_expectancy", "poverty", "household_income"))

mutate_at and <<-

Here we bend a bit the good practice of avoiding <<- whenever possible, but readability matters and this one is really easy to read.

digits <- c(1, 2, 0)
i <- 0
df %>%
  mutate_at(vars(life_expectancy, poverty, household_income), ~round(., digits[i<<- i+1]))
# A tibble: 1 x 4
#     name      life_expectancy poverty household_income
#     <chr>               <dbl>   <dbl>            <dbl>
#   1 New Haven            78.1    0.26            42589

(or just df %>% mutate_at(names(digits), ~round(., digits[i<<- i+1])) if you use a named vector as in my first solution)

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • This is the correct way to do it. I've deleted my answer because while the output in the console matched OPs result, running `apply(df, 1, print)` showed that the values were each rounded to two decimals. – Mako212 Nov 14 '18 at 17:27
  • 1
    This is wild! So `imap` is mapping over `digits` and its names, then applying the `round` function, but also taking the original data frame in `...`? Am I getting that right? – camille Nov 14 '18 at 20:04
  • 1
    Also, I appreciate the GH comment! – camille Nov 14 '18 at 20:05
  • Yes you got it perfectly, passing the `lhs` to the `...` is a trick I like a lot, I added a more explicit version for clarity. – moodymudskipper Nov 14 '18 at 20:11
2

Here's a map2 solution along the lines of Henrik's comment. You can then wrap this inside a custom function. I provided an rough first attempt but I have done minimal tests, so it probably breaks under all sorts of situations if evaluation is strange. It also doesn't use tidyselect for .at, but neither does modify_at...

library(tidyverse)

df <- tibble::tribble(
  ~name, ~life_expectancy,          ~poverty, ~household_income,
  "New Haven", 78.0580437642378, 0.264221051111753,  42588.7592521085,
  "New York", 12.349685329, 0.324067934, 32156.230974623
)

rounded <- df %>%
  select(life_expectancy, poverty, household_income) %>%
  map2_dfc(
    .y = c(1, 2, 0),
    .f = ~ round(.x, digits = .y)
  )
df %>%
  select(-life_expectancy, -poverty, -household_income) %>%
  bind_cols(rounded)
#> # A tibble: 2 x 4
#>   name      life_expectancy poverty household_income
#>   <chr>               <dbl>   <dbl>            <dbl>
#> 1 New Haven            78.1    0.26            42589
#> 2 New York             12.3    0.32            32156


modify2_at <- function(.x, .y, .at, .f) {
  modified <- .x[.at] %>%
    map2(.y, .f)
  .x[.at] <- modified
  return(.x)
}

df %>%
  modify2_at(
    .y = c(1, 2, 0),
    .at = c("life_expectancy", "poverty", "household_income"),
    .f = ~ round(.x, digits = .y)
  )
#> # A tibble: 2 x 4
#>   name      life_expectancy poverty household_income
#>   <chr>               <dbl>   <dbl>            <dbl>
#> 1 New Haven            78.1    0.26            42589
#> 2 New York             12.3    0.32            32156

Created on 2018-11-13 by the reprex package (v0.2.1)

Calum You
  • 14,687
  • 4
  • 23
  • 42
2

Fun with tidyeval:

prepared_pairs <- 
  map2(
    set_names(syms(list("life_expectancy", "poverty", "household_income"))),
    c(1, 2, 0), 
    ~expr(round(!!.x, digits = !!.y))
  )

mutate(df, !!! prepared_pairs)

# # A tibble: 1 x 4
#   name      life_expectancy poverty household_income
#   <chr>               <dbl>   <dbl>            <dbl>
# 1 New Haven            78.1    0.26            42589
Aurèle
  • 12,545
  • 1
  • 31
  • 49
  • Interesting. Using `expr` in this way for the entire expression is comparable to using `enquo` on individual variables? I'm still getting the hang of the different tidyeval verbs – camille Jan 31 '19 at 16:04
  • (Prefixing everything I say with "As far as I understand"): `expr` is a little more "bare" in the sense that it doesn't carry an environment with it. `expr` is like the lighter `quo` (not `enquo`) without an environment – Aurèle Jan 31 '19 at 16:09
  • I think `expr` is just `quote` except that it understands `!!` – moodymudskipper Feb 01 '19 at 10:25
  • 1
    It's a cool solution, if you use the definition of `digits` that I use it's a bit simpler to read as you can do : `prepared_pairs <- imap(digits, ~expr(round(!!.y, digits = !!.x)))` . It makes it a bit more readable than my similar solution, thanks to the additional step. It also has the advantage of supporting grouped data. My current solution doesn't as it gets the original df through the dot. – moodymudskipper Feb 01 '19 at 10:38
  • 1
    Thanks! An idea to make yours robust to grouped data frames is to wrap it in `do` like `df %>% do(mutate(., !!!imap(digits, ~round(..3[[.y]], .x),.)))` – Aurèle Feb 01 '19 at 15:08