1

I have a dataset with 2 columns: unique id and comments. I am able to form a word cloud with just the comments but I was hoping I could retain the unique ID per text so I can rejoin it when I visualize the result in Tableau.

Ex.

ID  | Text
a1   This is a test comment.
a2   Another test comment.
a3   This is very good
a4   I like this.

The output I was hoping for is:

ID  |  Words
--    
a1   This
a1   is
a1   a
a1   test
a1   comment
a2   Another
a2   test
a2   comment
a3   This
a3   is
a3   very
a3   good.

I hope you get my sample. THank you

J

Sotos
  • 51,121
  • 6
  • 32
  • 66
jols
  • 13
  • 2

2 Answers2

2
> df <- read.table(text='ID  Text
+ a1   "This is a test comment"
+ a2   "Another test comment"
+ a3   "This is very good"
+ a4   "I like this"', header=TRUE, as.is=TRUE)
> 
> 
> library(data.table)
> dt = data.table(df)
> dt[,c(Words=strsplit(Text, " ", fixed = TRUE)), by = ID]
    ID   Words
 1: a1    This
 2: a1      is
 3: a1       a
 4: a1    test
 5: a1 comment
 6: a2 Another
 7: a2    test
 8: a2 comment
 9: a3    This
10: a3      is
11: a3    very
12: a3    good
13: a4       I
14: a4    like
15: a4    this
Prasanna Nandakumar
  • 4,295
  • 34
  • 63
  • Thanks. This worked. But is there any way I can export it? When i try to do write.csv, it exports the original file for me. – jols Jul 14 '17 at 08:27
  • @jols dt <- as.data.frame(dt[,c(Words=strsplit(Text, " ", fixed = TRUE)), by = ID]) write.csv(dt,file="dt.csv"). Mark as answer if the above solution was helpful – Prasanna Nandakumar Jul 14 '17 at 08:29
1

You could do something like

library(tidyverse)
df<- tribble(
  ~ID, ~Text,
  "a1",   "This is a test comment.",
  "a2",   "Another test comment.",
  "a3",   "This is very good",
  "a4",   "I like this."
)

split_data <- strsplit(df$Text, " ")

do.call(rbind,
   lapply(seq_along(unique(df$ID)), function(x) {
        cbind(rep(df$ID[x], length(split_data[[x]])), split_data[[x]])
   })
)
maller
  • 229
  • 2
  • 4
  • 14