Collapse text by group in data frame

Question

How do I aggregate data frame by group in column group and collapse text in column text?

Sample data:

df <- read.table(header=T, text="
group text
a a1
a a2
a a3
b b1
b b2
c c1
c c2
c c3
")

Required output (data frame):

group text
a     a1a2a3
b     b1b2
c     c1c2c3

Now I have:

sapply(unique(df$group), function(x) {
  paste0(df[df$group==x,"text"], collapse='')
})

This works to some extent as it returns text properly collapsed by group, but as a vector:

[1] "a1a2a3" "b1b2"   "c1c2c3"

I need a data frame with group column as a result.

Victorp · Accepted Answer · 2015-09-15T15:20:52.157

36

Simply use aggregate :

aggregate(df$text, list(df$group), paste, collapse="")
##   Group.1      x
## 1       a a1a2a3
## 2       b   b1b2
## 3       c c1c2c3

Or with plyr

library(plyr)
ddply(df, .(group), summarize, text=paste(text, collapse=""))
##   group   text
## 1     a a1a2a3
## 2     b   b1b2
## 3     c c1c2c3

ddply is faster than aggregate if you have a large dataset.

EDIT : With the suggestion from @SeDur :

aggregate(text ~ group, data = df, FUN = paste, collapse = "")
##   group   text
## 1     a a1a2a3
## 2     b   b1b2
## 3     c c1c2c3

For the same result with earlier method you have to do :

aggregate(x=list(text=df$text), by=list(group=df$group), paste, collapse="")

EDIT2 : With data.table :

library("data.table")
dt <- as.data.table(df)
dt[, list(text = paste(text, collapse="")), by = group]
##    group   text
## 1:     a a1a2a3
## 2:     b   b1b2
## 3:     c c1c2c3

edited Sep 15 '15 at 15:20

answered Mar 31 '14 at 08:04

Victorp

13,636
2
51
55

3

using the formula form of `aggregate`gives prettier name : aggregate( text ~ group, data = df, FUN = paste, collapse = "") – SeDur Mar 31 '14 at 10:09
@rawr that's in the first edit – Victorp Jan 17 '16 at 16:34
The non-formula `aggregate` doesn't need to be as torturous either - `aggregate(df["text"], df["group"], paste, collapse="")` will do it just fine. – thelatemail Mar 03 '16 at 05:57

score 29 · Answer 2 · edited Jul 26 '15 at 21:12

29

You can use dplyr package for this

library(dplyr)

df %>%
  group_by(group) %>%
  summarise(text=paste(text,collapse=''))

edited Jul 26 '15 at 21:12

David Arenburg

91,361
17
137
196

answered Mar 31 '14 at 08:02

Chitrasen

1,706
18
15

3

when you collapse all rows, how can you keep all variable values and not just one assigned one? – richiepop2 Aug 08 '16 at 18:27

Collapse text by group in data frame

2 Answers2

Linked

Related