4

Is there a more tidyverse-idiomatic way to combine several columns into a list column than using mapply?

For example given the following

tibble(.rows = 9) %>% 
  mutate(foo = runif(n()),
         a_1 = runif(n()),
         a_2 = runif(n()),
         a_3 = runif(n())) ->
  Z

(where Z might contain other columns, and might also contain more than 3 as) one can do

Z %>% mutate(A = mapply(c, a_1, a_2, a_3, SIMPLIFY = FALSE))

which works fine, although it would be nice to be able to say starts_with('a_') instead of a_1, a_2, a_3.

Another possibility is

Z %>% 
  rowid_to_column() %>% 
  pivot_longer(cols = starts_with('a_')) %>% 
  group_by(rowid) %>% 
  summarise(foo = unique(foo),
            A = list(value)) %>% 
  select(-rowid)

which technically works, but introduces other problems (e.g., it uses an ugly foo = unique(foo); furthermore if instead of just one foo there were many foos it would become a bit more involved).

banbh
  • 1,331
  • 1
  • 13
  • 31
  • Check the thread here - [data.frame rows to a list](https://stackoverflow.com/questions/3492379/data-frame-rows-to-a-list) , likely something like `transpose(select(., starts_with('a_')))` should work inside `mutate` – arg0naut91 Mar 02 '20 at 15:07
  • It does seem like `Z %>% mutate(A = transpose(starts_with('a_')))` should work (or perhaps something using `pmap`) but in both cases I get `Error: No tidyselect variables were registered`. – banbh Mar 02 '20 at 15:23
  • You need `transpose(select(., starts_with('a_')))`. – Giovanni Colitti Mar 02 '20 at 15:24
  • Use `Z %>% mutate(A = transpose(select(., starts_with('a_'))))` – arg0naut91 Mar 02 '20 at 15:25
  • You're right -- that works (sorry about missing the `select(.,...)`). I think that's likely "the answer". @arg0naut91 If you promote your comment to an answer I'll accept it. – banbh Mar 02 '20 at 15:28
  • Actually I just noticed that `transpose` results in a list column that is a list of *lists*. However what I want is the same as what `mapply` returns, namely a list of vectors. It turns out that `pmap(select(., ...), c)` (as in the answer of @koenniem) does the right thing. – banbh Mar 02 '20 at 15:40

1 Answers1

7

Based on a previous answer (now deleted) and the comments, I made a comparison of different solutions:

FUN_mapply <- function() {  Z %>% mutate(A = mapply(c, a_1, a_2, a_3, SIMPLIFY = FALSE)) }
FUN_asplit <- function() { Z %>% mutate(A = asplit(.[,grepl("^a", colnames(.))], 1))  }
FUN_pmap <- function() { Z %>% mutate(A = pmap(.[,grepl("^a", colnames(.))], c)) }
FUN_transpose <- function() { Z %>% mutate(A = transpose(.[,grepl("^a", colnames(.))])) }
FUN_asplit_tidy <- function() { Z %>% mutate(A = asplit(select(., starts_with("a")), 1))  }
FUN_pmap_tidy <- function() { Z %>% mutate(A = pmap(select(., starts_with("a")), c)) }
FUN_transpose_tidy <- function() { Z %>% mutate(A = transpose(select(., starts_with("a")))) }

all(unlist(pmap(list(FUN_mapply()$A, FUN_asplit()$A, FUN_pmap()$A, FUN_transpose()$A), ~all(mapply(all.equal, .x, .y, MoreArgs = list(attributes = F)))))) # All A columns are equal?

mb <- microbenchmark::microbenchmark(
    FUN_mapply(),
    FUN_asplit(),
    FUN_pmap(),
    FUN_transpose(),
    FUN_asplit_tidy(), 
    FUN_pmap_tidy(), 
    FUN_transpose_tidy(), 
    times = 1000L
)

ggplot2::autoplot(mb)

enter image description here

Edit: Replace select(., starts_with("a")) with Z[,grepl("^a", colnames(Z))]

koenniem
  • 506
  • 2
  • 10
  • Regarding the edit (Replace ... `starts_with` ... `grepl`), I assume it makes the code faster, right? However (IMO) the new version is less pipeline-friendly and less `tidyverse`-idiomatic. For the record, I feel that the original answer was the one I wanted to accept (although I won't change my acceptance.) – banbh Mar 02 '20 at 15:58
  • That's quite right. `grepl` subsets Z to get only the columns needed. `starts_with` does the same but is much slower. Based on your comments, I feel that I need to modify my answer to make it more pipe-friendly. For example, if you would do some modifications before _the_ mutate, Z may have changed and the result will therefore be incorrect. Using `.` instead of `Z` will fix this. – koenniem Mar 03 '20 at 10:39
  • One possibility is that you can add back in functions that worked using `starts_with` (perhaps appending something like `_tidyverse` to their names). Then the answer would cover both my question, as well as the speed issue. – banbh Mar 03 '20 at 13:58
  • 1
    I assume you meant add `starts_with` functions to the comparison? If so, I updated my answer. – koenniem Mar 04 '20 at 10:47