I want to do a bigram analysis using the function unnest_tokens from tidytext package. It worked a few weeks before but now my output is only NA.
I have following df str(df)
:
'data.frame': 36199 obs. of 1 variable: $ words: chr "jed" "tag" "neu" "verunsichern" ...
When I use following code for the bigram analysis bigrams <- df %>% unnest_tokens_(bigram, words, token = "ngrams", n = 2)
my output has only NA's:
head(bigrams)
:
| | bigram |
| -------- | -------------- |
| 1.content| NA |
| 2.content| NA |
I needed to clean the loaded text before with tm_map function and then I created the df as follows:
df0 <- data.frame(text=unlist(sapply(myVCorpus,
[, "content")), stringsAsFactors=F)
and df<- df0 %>% unnest_tokens(words, text)
. I made an update to the latest R and updated my packages. I really do not know, why it stopped working.
Thank you in advance for your comments. Kind regards Christoph
Update:
It must have something to do with the prepared df. If I use the df, where the whole sentence is in one cell before I cleaned it, it works. Thats my procedure how I cleand the text:
df<- %>% unnest_tokens(word, text)
myVCorpus <- VCorpus(VectorSource(df))
myVCorpus <- tm_map(text.prep, removeWords, stopwords(kind= "de"))
Update2:
I found the solution. I needed to untidy the prepared df and this post helped: Opposite of unnest_tokens with the following code text.ngram<-df%>% group_by(mygroupingvariable) %>% summarize(text = str_c(words, collapse = " ")) %>% ungroup()