Hello I have a document term matrix and I transformed it with the tidy()
function and it works perfect. I want to plot a word cloud based on the frequency of a word. So my transformed table looks like this:
> head(Wcloud.Data)
# A tibble: 6 x 3
document term count
<chr> <chr> <dbl>
1 1 accept 1
2 1 access 1
3 1 accomplish 1
4 1 account 4
5 1 accur 2
6 1 achiev 1
I have 33,647,383 observations so its a very big dataframe. If I use the max()
function I am getting a very high number (64116) but no word in my dataframe has a frequency of 64116. Also if I plot the dataframe in shiny with wordcloud()
it plots same words several times. Also if I want to sort my column count
its not working - sort(Wcloud.Data$count,decreasing = TRUE)
. So something is not correct but I dont know, what and how to solve it. Somebody has any idea?
Thas the summary of my document term matrix, before transform it into a dataframe:
> observations.tf
<<DocumentTermMatrix (documents: 76717, terms: 4234)>>
Non-/sparse entries: 33647383/291172395
Sparsity : 90%
Maximal term length: 15
Weighting : term frequency (tf)
Update: I add a picture of my dataframe