Issue with R tidytext Package function unnest_tokens

Question

I want to do a bigram analysis using the function unnest_tokens from tidytext package. It worked a few weeks before but now my output is only NA.

I have following df str(df):

'data.frame': 36199 obs. of 1 variable: $ words: chr "jed" "tag" "neu" "verunsichern" ...

When I use following code for the bigram analysis bigrams <- df %>% unnest_tokens_(bigram, words, token = "ngrams", n = 2) my output has only NA's:

head(bigrams): | | bigram | | -------- | -------------- | | 1.content| NA | | 2.content| NA |

I needed to clean the loaded text before with tm_map function and then I created the df as follows: df0 <- data.frame(text=unlist(sapply(myVCorpus, [, "content")), stringsAsFactors=F) and df<- df0 %>% unnest_tokens(words, text). I made an update to the latest R and updated my packages. I really do not know, why it stopped working.

Thank you in advance for your comments. Kind regards Christoph

Update: It must have something to do with the prepared df. If I use the df, where the whole sentence is in one cell before I cleaned it, it works. Thats my procedure how I cleand the text: df<- %>% unnest_tokens(word, text) myVCorpus <- VCorpus(VectorSource(df)) myVCorpus <- tm_map(text.prep, removeWords, stopwords(kind= "de"))

Update2: I found the solution. I needed to untidy the prepared df and this post helped: Opposite of unnest_tokens with the following code text.ngram<-df%>% group_by(mygroupingvariable) %>% summarize(text = str_c(words, collapse = " ")) %>% ungroup()

you need to use your df0 for creating the bigrams, not df. df is already unnested and can't be used for creating bigrams. — phiver, Mar 08 '21 at 16:44
@AdamSampson: No, i do not receive any warnings. If I simply use `text<-data.frame(text="Das ist gut.")` and then do `text.bigrams <- text %>% unnest_tokens(output = bigram, input = text, token = "ngrams", n = 2)` it works but not with the df, which I cleaned in a VCorpus. If I use df0, I get the error: "Must extract column with a single valid subscript. x Subscript var has size 980 but must be size 1." . — ChristophG, Mar 09 '21 at 10:50

Issue with R tidytext Package function unnest_tokens

0 Answers0