Include column names as function input with dplyr

Question

I often have to transform long tidy data frames into wide format. To do so I use the following standard procedure:

# Example data frame
df <- data.frame("ID" = rep(1:5, each = 4), "score" = runif(20, 0, 100), "location" = rep(c("a", "b", "c", "d"), 5))

# Transform into wide format
df_wide <- df %>%
  group_by_at(vars(-score)) %>%  # group by everything other than the value column. 
  mutate(row_id=1:n()) %>% ungroup() %>%  # build group index
  spread(key=location, value=score) %>%    # spread
  dplyr::select(-row_id)

Instead of typing this little script over and over again, I wanted to define a function to do it automatically. I found many useful posts on how to include column names as function inputs, but somehow it doesn't work or I get error messages. What am I doing wrong?

Below a few of my attempts (neither of them work), following these and this suggestions:

wide_fun <- function(dat, key_name, value_name) {
  group_by_at(vars(- !! sym(value_name))) %>%  # group by everything other than the value column. 
    mutate(row_id=1:n()) %>% ungroup() %>%  # build group index
    spread(key=!! sym(key_name), value=!! sym(value_name)) %>%    # spread
    dplyr::select(-row_id)
}

wide_fun2 <- function(dat,  key_name, value_name) {
  key_col <- enquo(key_name)
  value_col <- enquo(value_name)
  group_by_at(vars(- !!value_col)) %>%  # group by everything other than the value column. 
    mutate(row_id=1:n()) %>% ungroup() %>%  # build group index
    spread(key= !!key_col, value= !!value_col) %>%    # spread
    dplyr::select(-row_id)
}

wide_fun3 <- function(dat, key_name, value_name) {
  group_by_at(vars(- value_name)) %>%  # group by everything other than the value column. 
    mutate(row_id=1:n()) %>% ungroup() %>%  # build group index
    spread(key=key_name, value=value_name) %>%    # spread
    dplyr::select(-row_id)
}

wide_fun3(df, quote(location), quote(score))

Thanks for your help!

For grouping use `group_by(across(all_of(value_name)))`, and instead of spread use `pivot_wider(names_from=key_name, values_from=value_name)`. — rpolicastro, Sep 11 '20 at 11:48
Hi! Thanks for your suggestions, but I'm not trying to change my way of transforming the data frame, I would just like to know why I cannot include the column names into the function as it is now. — Em Laskey, Sep 11 '20 at 11:56
Also ````across()````seems to be new in dplyr? I updated it, but still get the error message that it cannot be found... — Em Laskey, Sep 11 '20 at 12:03
That code lets you use character vectors as input. Quick amendment to my original answer. `group_by(across(!all_of(value_name)))` lets you specify one or more values for value_name, and will group based on anything but those values. — rpolicastro, Sep 11 '20 at 12:04

starja · Accepted Answer · 2020-09-11T12:18:24.433

I've slightly updated your code to dplyr 1.0.0 and tidyr. Then you can make use of the new dplyr programming feature {{}} to specify variables that are arguments of a function.

# Example data frame
df <- data.frame("ID" = rep(1:5, each = 4), "score" = runif(20, 0, 100), "location" = rep(c("a", "b", "c", "d"), 5))
library(dplyr)
wide_fun <- function(.data, key_name, value_name) {
  .data %>% 
  group_by(across(-{{value_name}})) %>%  # group by everything other than the value column. 
    mutate(row_id = 1:n()) %>% ungroup() %>%  # build group index
    tidyr::pivot_wider(
      names_from = {{key_name}},
      values_from = {{value_name}}) %>%    # spread
    select(-row_id)
}

wide_fun(df, location, score)
#> # A tibble: 5 x 5
#>      ID     a     b     c     d
#>   <int> <dbl> <dbl> <dbl> <dbl>
#> 1     1  90.8  38.9  28.7  39.0
#> 2     2  94.5  24.9  84.6  54.6
#> 3     3  61.1  97.2  12.2  57.7
#> 4     4  52.7  85.6  41.4 100. 
#> 5     5  17.8  86.1  92.3  33.7

^{Created on 2020-09-11 by the reprex package (v0.3.0)}

Edit

This function should also work with older versions of dplyr:

library(dplyr)
wide_fun_2 <- function(.data, key_name, value_name) {
  .data %>% 
    group_by_at(vars(-!!ensym(value_name))) %>%  # group by everything other than the value column. 
    mutate(row_id = 1:n()) %>% ungroup() %>%  # build group index
    tidyr::pivot_wider(
      names_from = !!ensym(key_name),
      values_from = !!ensym(value_name)) %>%    # spread
    select(-row_id)
}

df %>% 
  wide_fun_2(location, score)
 A tibble: 5 x 5
     ID      a     b     c     d
  <int>  <dbl> <dbl> <dbl> <dbl>
1     1 72.2    81.4  52.5  48.8
2     2 36.1    27.5  82.2  73.0
3     3 83.9    68.2  80.9  15.7
4     4  0.451  70.0  18.5  43.2
5     5 82.6    68.2  22.8  63.0

If you just provide the argument that specifies the column, you only need to deal with symbols and not quosures, therefore you need to use ensym.

This looks very nice, thanks! I'm having trouble updating dplyr, but I will try it. — Em Laskey, Sep 11 '20 at 12:03
Ok, I see - I didn't know that. I have never worked with this type of functions before, thanks a lot! — Em Laskey, Sep 11 '20 at 12:27
Here is a good introduction to metaprogramming: https://adv-r.hadley.nz/meta-big-picture.html — starja, Sep 11 '20 at 12:29

Include column names as function input with dplyr

1 Answers1

Edit