0

Let's say I have a data frame consisting of sentence IDs and terms like so:

data.frame(sid = c(1, 1, 2, 2), text = c("hello", "world", "whats", "up"))

How could I aggregate it to get a data frame like so:

data.frame(sid = c(1, 2), text = c("hello world", "whats up"))

Or better yet, as a list with corresponding indices like so:

list("hello world", "whats up")
Christopher Costello
  • 1,186
  • 2
  • 16
  • 30
  • Nevermind. Figured it out. The group_by and summarize functions do the job. `library(tidyverse) tibble(sid = c(1, 1, 2, 2), text = c("hello", "world", "whats", "up")) %>% group_by(sid) %>% summarise(token = paste(lemma, collapse = " "))` – Christopher Costello Mar 29 '18 at 15:40
  • 1
    You can post it as an answer to your own question. – csgroen Mar 29 '18 at 15:42
  • You could use `plyr` and `stringr` to do this: library(plyr) library(stringr) bla <- data.frame(sid = c(1, 1, 2, 2), text = c("hello", "world", "whats", "up")) ddply(bla, c('sid'), summarise, text = str_c(text, collapse = " ")) The result is sid text 1 1 hello world 2 2 whats up – JAQuent Mar 29 '18 at 15:46

0 Answers0