1

Looking to concatenate a column of strings (separated by a ",") when grouped by another column. Example of raw data:

Column1 Column2
1       a
1       b
1       c
1       d
2       e
2       f
2       g
2       h
3       i
3       j
3       k
3       l
Results Needed:
Column1   Grouped_Value
1         "a,b,c,d"
2         "e,f,g,h"
3         "i,j,k,l"

I've tried using dplyr, but I seem to be getting the below as a result

Column1   Grouped_Value
1         "a,b,c,d,e,f,g,h,i,j,k,l"
2         "a,b,c,d,e,f,g,h,i,j,k,l"
3         "a,b,c,d,e,f,g,h,i,j,k,l"
summ_data <- 
  df_columns %>%
  group_by(df_columns$Column1) %>%
  summarise(Grouped_Value = paste(df_columns$Column2, collapse =","))
ssmit474
  • 11
  • 2

2 Answers2

2

We can do this with aggregate

aggregate(Column2 ~ Column1, df1, toString)

Or with dplyr

library(dplyr)
df1 %>%
   group_by(Column1) %>%
   summarise(Grouped_value =toString(Column2))
# A tibble: 3 x 2
#  Column1 Grouped_value
#    <int> <chr>        
#1       1 a, b, c, d   
#2       2 e, f, g, h   
#3       3 i, j, k, l   

NOTE: toString is wrapper for paste(., collapse=', ')


The issue in OP' solution is that it is pasteing the whole column (df1$Column2 or df1[['Column2']] - breaks the grouping and select the whole column) instead of the grouped elements

data

df1 <- structure(list(Column1 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L), Column2 = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j", "k", "l")), class = "data.frame", row.names = c(NA, 
-12L))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
1

First commandment of dplyr

Don't use dollar signs in dplyr commands!

Use

group_by(Column1)

and

summarise(Grouped_Value = paste(Column2, collapse =","))
MrFlick
  • 195,160
  • 17
  • 277
  • 295