1

I want to concatenate all rows from a single column into single row and apply it on groups. I've figured out a solution using dplyr::do and dplyr::summarize and it works good on small datasets, however it is EXTREMELY slow on larger data sets.

Maybe someone has an idea how to optimize this? Already checked: this

Reproducible example:

df <- data.frame(group = c(rep("A",3), rep("B", 3)),
                 value = c(rep("C",3), rep("D",3)))
joined_vec <- df %>%
    dplyr::group_by(group) %>%
    dplyr::do(
      dplyr::summarize(.,
                       value_joined = dplyr::pull(., value) %>% paste(collapse = " ")
      )
    ) %>% dplyr::pull(value_joined)

Output:

> joined_vec
[1] "C C C" "D D D"

Thanks for any ideas!

StupidWolf
  • 45,075
  • 17
  • 40
  • 72

1 Answers1

0

I think you can do without the do and pull part because you are just operating on the column:

df %>% group_by(group) %>% 
summarize(value_joined=paste(value,collapse=" ")) %>% 
pull(value_joined)

[1] "C C C" "D D D"

You can also do this in base R:

tapply(df$value,df$group,paste,collapse=" ")
      A       B 
"C C C" "D D D" 
StupidWolf
  • 45,075
  • 17
  • 40
  • 72