Is there a way to do the opposite of unnest_tokens? I want to combine words into a row based on a unique ID

Question

I am currently trying to do some sentiment analysis and I want to revert each word back into its original format. So I want each word belonging to a unique ID to be combined in a single row. So I want the opposite of unnest_tokens function. I have tried the following:

dsWords <- dsWords %>% 
  group_by(IDReview) %>% 
  summarize(text = str_c(word, collapse = " ")) %>%
  ungroup()

However, I simply get all the words combined into 1 row, instead of a row for each unique ID. Can anyone help me out here? Below is a screenshot of what my data frame looks like and a subset of my data.

structure(list(IDReview = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    word = c("love", "love", "author", "side", "end", "show", 
    "one", "way", "think", "everyon", "also", "idea", "mani", 
    "amaz", "look", "mani", "idea", "think", "learn", "someth", 
    "dont", "know", "look", "fact", "see", "right", "dont", "write", 
    "review", "will", "hero", "will", "hes", "person", "tri", 
    "short", "certain", "never", "find", "like")), row.names = c("1", 
"1.1", "1.2", "1.4", "1.6", "1.13", "1.14", "1.15", "1.16", "1.17", 
"1.18", "1.19", "1.20", "1.24", "1.25", "1.27", "1.28", "1.30", 
"1.33", "1.34", "1.35", "1.36", "1.37", "1.38", "1.39", "1.41", 
"1.42", "1.44", "1.45", "2", "2.3", "2.5", "2.10", "2.12", "2.18", 
"2.23", "2.26", "2.27", "2.30", "2.34"), class = "data.frame")

It should give a row for each `IDReview` (and it does on my machine). Are you sure you are using `dplyr`'s `group_by` and `summarize` functions? You can make sure this is the case by using `dplyr::group_by` and `dplyr::summarize`. — Bas, May 09 '20 at 19:35
@Bas Yeah, I guess that was the problem. it is working now! Thank you — Reinout, May 10 '20 at 11:10

score 0 · Accepted Answer · answered May 09 '20 at 21:38

As Bas wrote in the comments, the following code with explicit package names

dsWords %>% 
  dplyr::group_by(IDReview) %>% 
  dplyr::summarise(text = stringr::str_c(word, collapse = " ")) %>%
  ungroup()

gives as output

# A tibble: 2 x 2
  IDReview text                                                                                          
     <int> <chr>                                                                                         
1        1 love love author side end show one way think everyon also idea mani amaz look mani idea think~
2        2 will hero will hes person tri short certain never find like

That is what you intend, isn't it?

Note that there might be problems when you load plyr after dplyr, see here.

Thank you, that was the problem indeed – Reinout May 10 '20 at 11:11 — Reinout, May 10 '20 at 11:11

Is there a way to do the opposite of unnest_tokens? I want to combine words into a row based on a unique ID

1 Answers1