I am using cast_dtm command to convert the one-term-per-document-per-row dataframe to a document term matrix to be used as input to LDA. The code is:
posts_tokenized.dt %>% cast_dtm(id, word, term_frequency) -> posts.dtm
It worked fine with a corpus of 33,000 documents but is giving the following error on using a corpus of 147,242 documents.
Error in validObject(r) : invalid class "dgTMatrix" object: length(Dimnames[1]) differs from Dim[1] which is 147242
Any help is appreciated!
EDIT: The tokenized dataframe looks like this:
> head(df_tokenized)
# A tibble: 6 x 3
id word term_frequency
<fctr> <chr> <int>
1 6013004059_10154817753659060 demonetisation 1
2 6013004059_10154828153334060 demonetisation 1
3 6013004059_10154835596219060 demonetisation 1
4 6013004059_10154837355359060 demonetisation 1
5 6013004059_10154872354154060 demonetisation 1
6 6013004059_10154556655804060 hanjin 1
None of the columns contain empty or NA values.