0

I am currently trying to do some sentiment analysis and I want to revert each word back into its original format. So I want each word belonging to a unique ID to be combined in a single row. So I want the opposite of unnest_tokens function. I have tried the following:

dsWords <- dsWords %>% 
  group_by(IDReview) %>% 
  summarize(text = str_c(word, collapse = " ")) %>%
  ungroup()

However, I simply get all the words combined into 1 row, instead of a row for each unique ID. Can anyone help me out here? Below is a screenshot of what my data frame looks like and a subset of my data.

enter image description here

structure(list(IDReview = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    word = c("love", "love", "author", "side", "end", "show", 
    "one", "way", "think", "everyon", "also", "idea", "mani", 
    "amaz", "look", "mani", "idea", "think", "learn", "someth", 
    "dont", "know", "look", "fact", "see", "right", "dont", "write", 
    "review", "will", "hero", "will", "hes", "person", "tri", 
    "short", "certain", "never", "find", "like")), row.names = c("1", 
"1.1", "1.2", "1.4", "1.6", "1.13", "1.14", "1.15", "1.16", "1.17", 
"1.18", "1.19", "1.20", "1.24", "1.25", "1.27", "1.28", "1.30", 
"1.33", "1.34", "1.35", "1.36", "1.37", "1.38", "1.39", "1.41", 
"1.42", "1.44", "1.45", "2", "2.3", "2.5", "2.10", "2.12", "2.18", 
"2.23", "2.26", "2.27", "2.30", "2.34"), class = "data.frame")
Reinout
  • 29
  • 5
  • It should give a row for each `IDReview` (and it does on my machine). Are you sure you are using `dplyr`'s `group_by` and `summarize` functions? You can make sure this is the case by using `dplyr::group_by` and `dplyr::summarize`. – Bas May 09 '20 at 19:35
  • @Bas Yeah, I guess that was the problem. it is working now! Thank you – Reinout May 10 '20 at 11:10

1 Answers1

0

As Bas wrote in the comments, the following code with explicit package names

dsWords %>% 
  dplyr::group_by(IDReview) %>% 
  dplyr::summarise(text = stringr::str_c(word, collapse = " ")) %>%
  ungroup()

gives as output

# A tibble: 2 x 2
  IDReview text                                                                                          
     <int> <chr>                                                                                         
1        1 love love author side end show one way think everyon also idea mani amaz look mani idea think~
2        2 will hero will hes person tri short certain never find like

That is what you intend, isn't it?

Note that there might be problems when you load plyr after dplyr, see here.

Taufi
  • 1,557
  • 8
  • 14