Error inserting/retrieving tweets into mongolite db

Question

I am trying to perform sentiment analysis on the tweets that were already fetched and stored in MongoDb. After fetching the tweets which is in dataframe format, i am getting the following error:

ip.txt=laply(ip.lst,function(t) t$getText())
Error in t$getText : $ operator is invalid for atomic vectors

The entire code is given below:

iphone.tweets <- searchTwitter('#iphone', n=15, lang="en")
iphone.text=laply(iphone.tweets,function(t) t$getText())
df_ip <- as.data.frame(iphone.text)

m <- mongo("iphonecollection",db="project")
m$insert(df_ip)
df_ip<-m$find()
ip.lst<-as.list(t(df_ip))
ip.txt=laply(ip.lst,function(t) t$getText())

What I wish to do is to calculate the sentiment scores as follows:

iphone.scores <- score.sentiment(ip.txt, pos.words,neg.words, .progress='text')

score.sentiment routine is as follows:

  score.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
  require(plyr)
  require(stringr)
   # we got a vector of sentences. plyr will handle a list or a vector as an "l" for us
   # we want a simple array of scores back, so we use "l" + "a" + "ply" = laply:
  scores = laply(sentences, function(sentence, pos.words, neg.words) {
    # clean up sentences with R's regex-driven global substitute, gsub():
    sentence = gsub('[[:punct:]]', '', sentence)
    sentence = gsub('[[:cntrl:]]', '', sentence)
    sentence = gsub('\\d+', '', sentence)
    # and convert to lower case:
    sentence = tolower(sentence)
    # split into words. str_split is in the stringr package
    word.list = str_split(sentence, '\\s+')
    # sometimes a list() is one level of hierarchy too much
    words = unlist(word.list)
    # compare our words to the dictionaries of positive & negative terms
    pos.matches = match(words, pos.words)
    neg.matches = match(words, neg.words)
    # match() returns the position of the matched term or NA
    # we just want a TRUE/FALSE:
    pos.matches = !is.na(pos.matches)
    neg.matches = !is.na(neg.matches)
    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
    score = sum(pos.matches) - sum(neg.matches)
    return(score)
   }, pos.words, neg.words, .progress=.progress )
   scores.df = data.frame(score=scores, text=sentences)
   return(scores.df)
 }

A few things. Where is your `score.sentiment` routine coming from? What is the point of the mongo db? And why can't you just put the `ip.lst` directly into the `score.sentiment` routine? — Mike Wise, Dec 26 '15 at 08:21
Instead of fetching the tweets all the time, I intend to store them for once into Mongodb and fetch and process tweets from there instead. — VBB, Dec 29 '15 at 04:11

score 1 · Accepted Answer · answered Dec 26 '15 at 13:03

I think you wanted to use sapply, which flattens the list of status object that searchTwitter returns. In any case this works. Note that you need to install and then start MongoDB for this to work:

library(twitteR)
library(plyr)
library(stringr)
library(mongolite)

# you have to set up a Twitter Application at https://dev.twitter.com/ to get these 
#
ntoget <- 600 # get 600 tweets

iphone.tweets <- searchTwitter('#iphone', n=ntoget, lang="en")
iphone.text <- sapply(iphone.tweets,function(t) t$getText())
df_ip <- as.data.frame(iphone.text)

# MongoDB must be installed and the service started (mongod.exe in Windows)
#
m <- mongo("iphonecollection",db="project")
m$insert(df_ip)
df_ip_out<-m$find()

# Following routine (score.sentiment) was copied from:
# http://stackoverflow.com/questions/32395098/r-sentiment-analysis-with-phrases-in-dictionaries
#
score.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
  require(plyr)  
  require(stringr)  
  # we got a vector of sentences. plyr will handle a list  
  # or a vector as an "l" for us  
  # we want a simple array ("a") of scores back, so we use  
  # "l" + "a" + "ply" = "laply":  
  scores = laply(sentences, function(sentence, pos.words, neg.words) {
    # clean up sentences with R's regex-driven global substitute, gsub():
    sentence = gsub('[[:punct:]]', '', sentence)
    sentence = gsub('[[:cntrl:]]', '', sentence)
    sentence = gsub('\\d+', '', sentence)    
    # and convert to lower case:    
    sentence = tolower(sentence)    
    # split into words. str_split is in the stringr package    
    word.list = str_split(sentence, '\\s+')    
    # sometimes a list() is one level of hierarchy too much    
    words = unlist(word.list)    
    # compare our words to the dictionaries of positive & negative terms
    pos.matches = match(words, pos)
    neg.matches = match(words, neg)   
    # match() returns the position of the matched term or NA    
    # we just want a TRUE/FALSE:    
    pos.matches = !is.na(pos.matches)   
    neg.matches = !is.na(neg.matches)   
    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
    score = sum(pos.matches) - sum(neg.matches)    
    return(score)    
  }, pos.words, neg.words, .progress=.progress )  
  scores.df = data.frame(score=scores, text=sentences)  
  return(scores.df)  
}

tweets <- as.character(df_ip_out$iphone.text)
neg = c("bad","prank","inferior","evil","poor","minor")
pos = c("good","great","superior","excellent","positive","super","better")
analysis <- score.sentiment(tweets,pos,neg)
table(analysis$score)

Yields the following (4 scored bad, 592 scored neutral, 4 scored good):

 -1   0   1 
  4 592   4

Thank you. Could you also tell me what does the following line in your code actually do: tweets <- as.character(df_ip_out$iphone.text) — VBB, Dec 29 '15 at 04:09
It converts the `df_ip_out$phone.text` vector from a factor vector to a character vector. You can see the type of a vector by using the `class()` function. — Mike Wise, Dec 29 '15 at 08:30
why is iphone.text used in as.character(df_ip_out$iphone.text)? My aim here is to process tweets that are fetched from mongoDB only. iphone.txt is obtained from the tweets returned by searchTwitter function. I want the variable tweets to be independent of the tweets fetched initially. It should only dpend on the data in mongoDB. — VBB, Dec 30 '15 at 04:48
I think you are confusing the `df_ip` dataframe, which is built from data retrieved by `searchTwitter`, and the `df_ip_out` dataframe, which is built from data retrieved from the `m$find` mongo retrieval function. — Mike Wise, Dec 30 '15 at 10:53
Actually `df_ip_out` keeps getting bigger every time you run it, since the newest `df_ip` data gets added to the database I noticed. Was too lazy to look up the commands to empty the db. — Mike Wise, Jan 02 '16 at 15:08

Error inserting/retrieving tweets into mongolite db

1 Answers1