0

I often have to transform long tidy data frames into wide format. To do so I use the following standard procedure:

# Example data frame
df <- data.frame("ID" = rep(1:5, each = 4), "score" = runif(20, 0, 100), "location" = rep(c("a", "b", "c", "d"), 5))

# Transform into wide format
df_wide <- df %>%
  group_by_at(vars(-score)) %>%  # group by everything other than the value column. 
  mutate(row_id=1:n()) %>% ungroup() %>%  # build group index
  spread(key=location, value=score) %>%    # spread
  dplyr::select(-row_id)

Instead of typing this little script over and over again, I wanted to define a function to do it automatically. I found many useful posts on how to include column names as function inputs, but somehow it doesn't work or I get error messages. What am I doing wrong?

Below a few of my attempts (neither of them work), following these and this suggestions:

wide_fun <- function(dat, key_name, value_name) {
  group_by_at(vars(- !! sym(value_name))) %>%  # group by everything other than the value column. 
    mutate(row_id=1:n()) %>% ungroup() %>%  # build group index
    spread(key=!! sym(key_name), value=!! sym(value_name)) %>%    # spread
    dplyr::select(-row_id)
}

wide_fun2 <- function(dat,  key_name, value_name) {
  key_col <- enquo(key_name)
  value_col <- enquo(value_name)
  group_by_at(vars(- !!value_col)) %>%  # group by everything other than the value column. 
    mutate(row_id=1:n()) %>% ungroup() %>%  # build group index
    spread(key= !!key_col, value= !!value_col) %>%    # spread
    dplyr::select(-row_id)
}

wide_fun3 <- function(dat, key_name, value_name) {
  group_by_at(vars(- value_name)) %>%  # group by everything other than the value column. 
    mutate(row_id=1:n()) %>% ungroup() %>%  # build group index
    spread(key=key_name, value=value_name) %>%    # spread
    dplyr::select(-row_id)
}

wide_fun3(df, quote(location), quote(score))

Thanks for your help!

Em Laskey
  • 508
  • 4
  • 15
  • For grouping use `group_by(across(all_of(value_name)))`, and instead of spread use `pivot_wider(names_from=key_name, values_from=value_name)`. – rpolicastro Sep 11 '20 at 11:48
  • Hi! Thanks for your suggestions, but I'm not trying to change my way of transforming the data frame, I would just like to know why I cannot include the column names into the function as it is now. – Em Laskey Sep 11 '20 at 11:56
  • Also ````across()````seems to be new in dplyr? I updated it, but still get the error message that it cannot be found... – Em Laskey Sep 11 '20 at 12:03
  • That code lets you use character vectors as input. Quick amendment to my original answer. `group_by(across(!all_of(value_name)))` lets you specify one or more values for value_name, and will group based on anything but those values. – rpolicastro Sep 11 '20 at 12:04

1 Answers1

1

I've slightly updated your code to dplyr 1.0.0 and tidyr. Then you can make use of the new dplyr programming feature {{}} to specify variables that are arguments of a function.

# Example data frame
df <- data.frame("ID" = rep(1:5, each = 4), "score" = runif(20, 0, 100), "location" = rep(c("a", "b", "c", "d"), 5))
library(dplyr)
wide_fun <- function(.data, key_name, value_name) {
  .data %>% 
  group_by(across(-{{value_name}})) %>%  # group by everything other than the value column. 
    mutate(row_id = 1:n()) %>% ungroup() %>%  # build group index
    tidyr::pivot_wider(
      names_from = {{key_name}},
      values_from = {{value_name}}) %>%    # spread
    select(-row_id)
}

wide_fun(df, location, score)
#> # A tibble: 5 x 5
#>      ID     a     b     c     d
#>   <int> <dbl> <dbl> <dbl> <dbl>
#> 1     1  90.8  38.9  28.7  39.0
#> 2     2  94.5  24.9  84.6  54.6
#> 3     3  61.1  97.2  12.2  57.7
#> 4     4  52.7  85.6  41.4 100. 
#> 5     5  17.8  86.1  92.3  33.7

Created on 2020-09-11 by the reprex package (v0.3.0)

Edit

This function should also work with older versions of dplyr:

library(dplyr)
wide_fun_2 <- function(.data, key_name, value_name) {
  .data %>% 
    group_by_at(vars(-!!ensym(value_name))) %>%  # group by everything other than the value column. 
    mutate(row_id = 1:n()) %>% ungroup() %>%  # build group index
    tidyr::pivot_wider(
      names_from = !!ensym(key_name),
      values_from = !!ensym(value_name)) %>%    # spread
    select(-row_id)
}

df %>% 
  wide_fun_2(location, score)
 A tibble: 5 x 5
     ID      a     b     c     d
  <int>  <dbl> <dbl> <dbl> <dbl>
1     1 72.2    81.4  52.5  48.8
2     2 36.1    27.5  82.2  73.0
3     3 83.9    68.2  80.9  15.7
4     4  0.451  70.0  18.5  43.2
5     5 82.6    68.2  22.8  63.0

If you just provide the argument that specifies the column, you only need to deal with symbols and not quosures, therefore you need to use ensym.

starja
  • 9,887
  • 1
  • 13
  • 28