How to use string manipulation functions inside .names argument in dplyr::across

Question

Though I tried to search whether it is duplicate, but I cannot find similar question. (though a similar one is there, but that is somewhat different from my requirement)

My question is that whether we can use string manipulation function such substr or stringr::str_remove inside .names argument of dplyr::across. As a reproducible example consider this

library(dplyr)
iris %>%
  summarise(across(starts_with('Sepal'), mean, .names = '{.col}_mean'))

  Sepal.Length_mean Sepal.Width_mean
1          5.843333         3.057333

Now my problem is that I want to rename output columns say str_remove(.col, 'Sepal') so that my output column names are just Length.mean and Width.mean . Why I am asking because, the description of this argument states that

.names
A glue specification that describes how to name the output columns. This can use {.col} to stand for the selected column name, and {.fn} to stand for the name of the function being applied. The default (NULL) is equivalent to "{.col}" for the single function case and "{.col}_{.fn}" for the case where a list is used for .fns.

I have tried many possibilities including the following, but none of these work

library(tidyverse)
library(glue)
iris %>%
  summarise(across(starts_with('Sepal'), mean, 
                   .names = glue('{xx}_mean', xx = str_remove(.col, 'Sepal'))))

Error: Problem with `summarise()` input `..1`.
x argument `str` should be a character vector (or an object coercible to)
i Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Run `rlang::last_error()` to see where the error occurred.


#OR
iris %>%
  summarise(across(starts_with('Sepal'), mean, 
                   .names = glue('{xx}_mean', xx = str_remove(glue('{.col}'), 'Sepal'))))

I know that this can be solved by adding another step using rename_with so I am not looking after that answer.

You can use functions inside a glue string, such as `.names = '{str_remove(.col, "^[A-Za-z]+")}_mean'`, but it seems like this has limitations when it gets parsed — camille, May 15 '21 at 15:38
O yes! Can you please post that as answer, I'll be happy to accept that. :) — AnilGoyal, May 15 '21 at 15:40

score 9 · Accepted Answer · answered May 15 '21 at 15:46

This works, but with probably a few caveats. You can use functions inside a glue specification, so you could clean up the strings that way. However, when I tried escaping the ".", I got an error, which I assume has something to do with how across parses the string. If you need something more dynamic, you might want to dig into the source code at that point.

In order to use the {.fn} helper, at least in conjunction with creating the glue string on the fly like this, the function needs a name; otherwise you get a number for the function's index in the .fns argument. I tested this out with a second function and using lst for automatic naming.

library(dplyr)
iris %>%
  summarise(across(starts_with('Sepal'), .fns = lst(mean, max), 
                   .names = '{stringr::str_remove(.col, "^[A-Za-z]+.")}_{.fn}'))
#>   Length_mean Length_max Width_mean Width_max
#> 1    5.843333        7.9   3.057333       4.4

`summarise(across(starts_with("Sepal"), mean, .names = 'mean_{str_remove(.col, "Sepal.")}'))` works in my case. Thanks :) — AnilGoyal, May 15 '21 at 15:50

How to use string manipulation functions inside .names argument in dplyr::across

1 Answers1

Linked

Related