Hey i need help in removing words from my results gained through a twitter search, here is the code i use.
library("twitteR")
library("ROAuth")
cred$handshake()
save(cred, file="twitter.Rdata")
load("twitter.Rdata")
registerTwitterOAuth(cred)
tweets = searchTwitter('#apple', n = 100, lang = "en")
tweets.df = twListToDF(tweets)
names(tweets.df)
tweets.df$text
tweet.words = strsplit(tweets.df$text, "[^A-Za-z]+")
word.table = table(unlist(tweet.words))
library("tm")
myStopwords <- c(stopwords('english'), "#apple","http://")
tweet.corpus = Corpus(VectorSource(tweets.df$text))
tweet.corpus = tm_map(tweet.corpus,function(x) iconv(x, to='UTF8', sub='byte'))
tweet.corpus = tm_map(tweet.corpus, PlainTextDocument)
tweet.corpus = tm_map(tweet.corpus,removeWords, myStopwords)
tweet.dtm = DocumentTermMatrix(tweet.corpus)
tweet.matrix = inspect(tweet.dtm)
But the problem is, it isn't removing the results that contain #apple, and website addresses containing Http:// from the corpus, how i can i remove these results? thank you for the help, Matt.