I've been trying to follow along with a Udemy tutorial, using the tm package in R to do text mining on tweets.
So far, many of the functions specified in the tutorial (and in the tm pdf on cran.org) result in a series of errors, and I'm unclear how to resolve them. I'm coding in RStudio Version 1.0.143 and macOS Sierra. The code and errors are below are from my attempt to make a wordcloud from a series of tweets:
nyttweets <- searchTwitter("#NYT", n=1000)
nytlist <- sapply(nyttweets, function(x) x$getText())
nytcorpus <- Corpus(VectorSource(nytlist))
Here's where I encounter the first error
nytcorpus <- tm_map(nytcorpus, tolower)
**Warning message:
In mclapply(content(x), FUN, ...) :
all scheduled cores encountered errors in user code**
I saw the suggestion to do the following, which results in another error
nytcorpus <- tm_map(nytcorpus, tolower, mc.cores=1)
**Error in FUN(X[[1L]], ...) : invalid multibyte string 1**
If I instead use 'lazy=TRUE' after tolower and the other subsequent functions I run, I don't receive an error: However, when I finally try to construct the wordcloud I run into a large amount of errors:
library("twitteR")
library('wordcloud')
library('SnowballC')
library('tm')
nytcorpus <- tm_map(nytcorpus, tolower, lazy=TRUE)
nytcorpus <- tm_map(nytcorpus, removePunctuation, lazy=TRUE)
nytcorpus <- tm_map(nytcorpus, function(x) removeWords(x, stopwords()),
lazy=TRUE)
nytcorpus <- tm_map(nytcorpus, PlainTextDocument)
wordcloud(nytcorpus, min.freq=4, scale=c(5,1), random.color=F, max.word=45,
random.order=F)
**Warning messages:
1: In wordcloud(nytcorpus, min.freq = 4, scale = c(5, 1), random.color = F, :
'removewords' could not be fit on page. It will not be plotted.
2: In wordcloud(nytcorpus, min.freq = 4, scale = c(5, 1), random.color = F, :
"try-error" could not be fit on page. It will not be plotted.
3: In wordcloud(nytcorpus, min.freq = 4, scale = c(5, 1), random.color = F, :
applicable could not be fit on page. It will not be plotted.
4: In wordcloud(nytcorpus, min.freq = 4, scale = c(5, 1), random.color = F, :
object could not be fit on page. It will not be plotted.
5: In wordcloud(nytcorpus, min.freq = 4, scale = c(5, 1), random.color = F, :
usemethod("removewords", could not be fit on page. It will not be plotted.**
I'm not sure why the function, wordcloud is trying to plot the actual function words like 'removewords' or 'try-error', rather than words from the NYT tweets. I've seen suggestions to wrap the functions in content_transformer, for example
nytcorpus <- tm_map(nytcorpus, content_transformer(tolower))
However, I again just get the error 'all scheduled cores encountered errors in user 'code'.
This is all exceedingly frustrating, and I'm not sure if I should scrap using the tm package altogether, especially if there's something better out there. Any suggestions are greatly appreciated.