Create a column grouping strings text extracted from a column based on another column in R

Question

this is my dataset

id   text
 1    "red"
 1    "blue"
 2    "light blue"
 2    "red"
 2    "yellow"
 3    "dark green"

this is the result I want to obtain:

 id  text2
 1   "red, blue"
 2  "light blue, red, yellow"
 3  "dark green"

basically I need to put together the text from column 'text' with commas to separate the different elements

jay.sf · Accepted Answer · 2020-01-06T12:18:18.800

2

Using aggregate and toString.

aggregate(. ~ id, d, toString)
#   id                    text
# 1  1               red, blue
# 2  2 light blue, red, yellow
# 3  3              dark green

Note: This won't work with factor columns, i.e. if is.factor(d$text) yields TRUE you need a slightly different approach. Demonstration:

d$text <- as.factor(d$text)  # make 
is.factor(d$text)
#  [1] TRUE

Do:

aggregate(. ~ id, transform(d, text=as.character(text)), toString)

Data:

d <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c("red", 
"blue", "light blue", "red", "yellow", "dark green")), row.names = c(NA, 
-6L), class = "data.frame")

edited Jan 06 '20 at 12:18

answered Jan 06 '20 at 11:58

jay.sf

60,139
8
53
110

I m not sure how to convert my data frame {that looks like this id <- c(1,1,2,2,2,3) text <- c("red" , "blue", "light blue", "red" , "yellow" , "dark green" ) data <- cbind.data.frame(id, text) } using the command structure. What is c(NA, -6L)? how can I restructure my dataframe so that aggregate words properly? when I run "aggregate" in shows numbers in the text column instead of proper text – Carbo Jan 06 '20 at 12:08
Your column seems to be of class `"factor"`. Please see edit to my answer. You could also use `cbind.data.frame(id, text, stringsAsFactors=FALSE)`, though, to prevent factors beforehand. (The `structure(.)` thing is just the output of `dput(d)` which is the way we share data here on Stack Overflow, see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) – jay.sf Jan 06 '20 at 12:19

score 1 · Answer 2 · answered Jan 06 '20 at 16:19

We can use dplyr

library(dplyr)
df1 %>%
    group_by(id) %>%
    summarise(text2 = toString(text))

data

df1 <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c("red", 
"blue", "light blue", "red", "yellow", "dark green")), row.names = c(NA, 
-6L), class = "data.frame")

Create a column grouping strings text extracted from a column based on another column in R

2 Answers2

data