GROUP_CONCAT with dplyr or R

Question

I am having difficulty replicating the functionality of a typical SQL GROUP_CONCAT function in dplyr. I would also like to make sure the ordering inside the groups can be controlled. Ideally I want to use the hadleyverse/tidyverse but base R or other packages will work too.

Example data:

ID    name
1     apple
1     orange
2     orange
3     orange
3     apple

Desired output:

ID    name
1     apple,orange
2     orange
3     apple,orange

Note that for ID=3, the ordering is in alpha order, not how the rows are ordered. I think this can probably be handled by doing an arrange first, but it would be nice to control inside the summarise statement or the like.

akrun · Answer 1 · 2016-07-02T13:16:17.270

9

In R, we can use one of the group by operations.

library(dplyr)
df1 %>%
    group_by(ID) %>%
    summarise(name = toString(sort(unique(name))))
#     ID          name 
#   <int>         <chr>
#1     1 apple, orange
#2     2        orange
#3     3 apple, orange

Or using data.table

library(data.table)
setDT(df1)[, .(name = toString(sort(unique(name)))), by = ID]
#   ID          name
#1:  1 apple, orange
#2:  2        orange
#3:  3 apple, orange

edited Jul 02 '16 at 13:16

answered Jul 02 '16 at 13:08

akrun

874,273
37
540
662

1

nice answer , I like the data.table – nik Jul 03 '16 at 12:17
2

I think `stringr::str_c()` can also be used : `str_c(unique(name), collapse="; "))` - this allows the user to specify a custom separator . – knb Jul 24 '18 at 13:26

score 1 · Answer 2 · answered Jul 02 '16 at 13:13

1

For base R use this

aggregate(data=df,name~ID,FUN = function(t) sort(paste(t)))

Data

df<-read.table(header = T,text = "ID    name
1     apple
1     orange
2     orange
3     orange
3     apple")

answered Jul 02 '16 at 13:13

user2100721

3,557
2
20
29

GROUP_CONCAT with dplyr or R

2 Answers2