3

I am new to not only python, but programming altogether so I'd appreciate your help very much!

I am trying to filter detect all tweets from the twitter streaming API using Tweepy.

I have filtered by user id and have confirmed that tweets are being collected in real-time.

HOWEVER, it seems that only the second last tweet is being collected in real-time as opposed to the very latest tweet.

Can you guys help?

import tweepy
import webbrowser
import time
import sys

consumer_key = 'xyz'
consumer_secret = 'zyx'


## Getting access key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth_url = auth.get_authorization_url()
print 'From your browser, please click AUTHORIZE APP and then copy the unique PIN: ' 
webbrowser.open(auth_url)
verifier = raw_input('PIN: ').strip()
auth.get_access_token(verifier)
access_key = auth.access_token.key
access_secret = auth.access_token.secret


## Authorizing account privileges
auth.set_access_token(access_key, access_secret)


## Get the local time
localtime = time.asctime( time.localtime(time.time()) )


## Status changes
api = tweepy.API(auth)
api.update_status('It worked - Current time is %s' % localtime)
print 'It worked - now go check your status!'


## Filtering the firehose
user = []
print 'Follow tweets from which user ID?'
handle = raw_input(">")
user.append(handle)

keywords = []
print 'What keywords do you want to track? Separate with commas.'
key = raw_input(">")
keywords.append(key)

class CustomStreamListener(tweepy.StreamListener):

    def on_status(self, status):

        # We'll simply print some values in a tab-delimited format
        # suitable for capturing to a flat file but you could opt 
        # store them elsewhere, retweet select statuses, etc.



        try:
            print "%s\t%s\t%s\t%s" % (status.text, 
                                      status.author.screen_name, 
                                      status.created_at, 
                                      status.source,)
        except Exception, e:
            print >> sys.stderr, 'Encountered Exception:', e
            pass

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

# Create a streaming API and set a timeout value of ??? seconds.

streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout=None)

# Optionally filter the statuses you want to track by providing a list
# of users to "follow".

print >> sys.stderr, "Filtering public timeline for %s" % keywords

streaming_api.filter(follow=handle, track=keywords)
snakesNbronies
  • 3,619
  • 9
  • 44
  • 73

2 Answers2

5

I had this same problem. The answer was not as easy as running python unbuffered in my case, and I presume it didn't solve the original poster's problem as well. The problem is actually in the code for the tweepy package in a file called streaming.py and function _read_loop() which I think needs to be updated to reflect changes to the format that twitter outputs data from their streaming api.

The solution for me was to download the newest code for tweepy from github, https://github.com/tweepy/tweepy specifically the streaming.py file. You can view the changes being made recently to try to resolve this issue in the commit history for this file.

I looked into the details of the tweepy class, and there was an issue with the way the streaming.py class reads in the json tweet stream. I think it has to do with twitter updating their streaming api to include the number of bits of an incoming status. Long story short, here was the function I replaced in streaming.py to resolve this question.

def _read_loop(self, resp):

    while self.running and not resp.isclosed():

        # Note: keep-alive newlines might be inserted before each length value.
        # read until we get a digit...
        c = '\n'
        while c == '\n' and self.running and not resp.isclosed():
            c = resp.read(1)
        delimited_string = c

        # read rest of delimiter length..
        d = ''
        while d != '\n' and self.running and not resp.isclosed():
            d = resp.read(1)
            delimited_string += d

        try:
            int_to_read = int(delimited_string)
            next_status_obj = resp.read( int_to_read )
            # print 'status_object = %s' % next_status_obj
            self._data(next_status_obj)
        except ValueError:
            pass 

    if resp.isclosed():
        self.on_closed(resp)

This solution also requires learning how to download the source code for the tweepy package, modifying it, and then installing the modified library into python. Which is done by going into your top level tweepy directory and typing something like sudo setup.py install depending on your system.

I've also commented to the coders on github for this package to let them know whats up.

  • 3
    I've forked their repo and put this fix in, just waiting on a pull request. For the time being, you can grab the fixed version here: https://github.com/robbrit/tweepy – robbrit May 18 '12 at 13:30
  • @robbrit - thanks! i really appreciate this. has the pull been done yet? – snakesNbronies Jun 17 '12 at 21:26
1

This is a case of output buffering. Run python with -u (unbuffered) to prevent this from happening.

Or, you can force the buffer to be flushed by executing a sys.stdout.flush() after your print statement.

See this answer for more ideas.

Community
  • 1
  • 1
Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284