0

I have tibble which looks like:

Review_Text
<chr>
Because it is a nice game   
Best trump soumd board out there    
Boring hated because it does not work when I get done   
but you can make better game if game has unlimeted chemicals bottles    
cant get pass loading screen    
Can't play video    
Casting from Note 3 to Roku 3 screen appears to start loading then back to Roku home screen. Roku software version 6.1 build 5604. It is up to date but still not able to cast Showbox. ..  
Crashes all the time in the middle of the show. Whining ensues. Ugh.    
Crashing    
Does not work on tab 3  
Doesn't work    
Doesn't work with S7 which is unacceptable in this day and age. 
Doesn't work... I absolutely hate it    
Dont use this app battery consumers 
Dose this work for snmsung I tried some many times 😡 
😄I loved it so much I would recommend this to other families 😄    
Every time i pressed apply it just took me to the home screen   
Everytime it says collect on T.V. it won't obtain the magisword 
Excellent!!! My grandchildren watch it all the time...  
Feel like Lizzie McGuire 😂â\u009d¤

I want to remove the stopwords from the Review_Text and append the column (that does not have stopwords) with the existing tibble. I am using following code, to remove the stopwords:

no_stpwrd <- tibble(line = 1:nrow(tb), text = tb$Review_Text) %>%        
         unnest_tokens(word, text)%>%      
         anti_join(stop_words, by = c("word" = "word")) %>%              
         group_by(line) %>% summarise(title = paste(word,collapse =' '))

Then I use the following command to merge the no_stpwrd with the existing tibble:

add_column(tb,no_stpwrd).

However, when I run the above command, it throws an error message because of mismatch between the number of rows tibble and no_stowrd have. There are few row values in tibble which contains the only stopword (for example, line 11 of tibble), so when I remove stopwords it returns null hence the number of rows reduced in a no_stpwrd column. Is there any way to fix the issue?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
user2293224
  • 2,128
  • 5
  • 28
  • 52
  • Please provide a valid reprex (see [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)) to make it easier for others to help you. Though you have described your problem clearly, the lack of a reprex makes it difficult to replicate and troubleshoot the error you are getting. – hendrikvanb Mar 19 '20 at 06:45

1 Answers1

0

Instead of trying to use add_column() here, what you want to do is use a join.

library(tidyverse)
library(tidytext)

review_df <- tibble(review_text = c("Because it is a nice game",
                                    "cant get pass loading screen",
                                    "Because I don't",
                                    "Dont use this app battery consumers")) %>%
  mutate(line = row_number())

review_df
#> # A tibble: 4 x 2
#>   review_text                          line
#>   <chr>                               <int>
#> 1 Because it is a nice game               1
#> 2 cant get pass loading screen            2
#> 3 Because I don't                         3
#> 4 Dont use this app battery consumers     4

no_stpwrd <- review_df %>%
  unnest_tokens(word, review_text) %>%
  anti_join(get_stopwords())  %>%              
  group_by(line) %>% 
  summarise(title = paste(word,collapse =' '))
#> Joining, by = "word"

no_stpwrd
#> # A tibble: 3 x 2
#>    line title                         
#>   <int> <chr>                         
#> 1     1 nice game                     
#> 2     2 cant get pass loading screen  
#> 3     4 dont use app battery consumers

Notice that the third document is no longer there because it was made up of all stop words. It's time for a left_join().

review_df %>%
  left_join(no_stpwrd)
#> Joining, by = "line"
#> # A tibble: 4 x 3
#>   review_text                          line title                         
#>   <chr>                               <int> <chr>                         
#> 1 Because it is a nice game               1 nice game                     
#> 2 cant get pass loading screen            2 cant get pass loading screen  
#> 3 Because I don't                         3 <NA>                          
#> 4 Dont use this app battery consumers     4 dont use app battery consumers

Created on 2020-03-20 by the reprex package (v0.3.0)

Julia Silge
  • 10,848
  • 2
  • 40
  • 48