I want to text mine and for multiple reasons I have built a data frame where I have words in one column and frequency in the second example:
words freq
Have 123
have 5
having 4589
Note we can quickly see if the frequency is very large that doing it this way may be more efficient for transforming words rather than having a corpus with certain words repeated many many times.
I would like to use tm
to transform the words using tolower
, stemDocument
etc
I know I can pull the words
column out of the data frame into a corpus, but then I will lose the frequency information.
I would like to get:
words freq
have 123
have 5
have 4589
Then I think I can use setDT, the dplyr package or aggregate to get to:
words freq
have 4717
I plan to do this on a large data frame. Thanks
I did try to mimic tm: read in data frame, keep text id's, construct DTM and join to other dataset