I'm trying to do some topic modelling but want to use phrases where they exist rather than single words i.e.
library(topicmodels)
library(tm)
my.docs = c('the sky is blue, hot sun', 'flowers,hot sun', 'black cats, bees, rats and mice')
my.corpus = Corpus(VectorSource(my.docs))
my.dtm = DocumentTermMatrix(my.corpus)
inspect(my.dtm)
When I inspect my dtm it splits all the words up, but I want all the phrases together i.e. there should be a column for each of: the sky is blue hot sun flowers black cats bees rats and mice
How do make the Document Term Matrix recognise phrases and words? they are comma seperated
The solution needs to be efficient as I want to run it over a lot of data