3

I am using Tweepy to capture streaming tweets based off of the hashtag #WorldCup, as seen by the code below. It works as expected.

class StdOutListener(StreamListener):
  ''' Handles data received from the stream. '''

  def on_status(self, status):
      # Prints the text of the tweet
      print('Tweet text: ' + status.text)

      # There are many options in the status object,
      # hashtags can be very easily accessed.
      for hashtag in status.entries['hashtags']:
          print(hashtag['text'])

      return true

    def on_error(self, status_code):
        print('Got an error with status code: ' + str(status_code))
        return True # To continue listening

    def on_timeout(self):
        print('Timeout...')
        return True # To continue listening

if __name__ == '__main__':
   listener = StdOutListener()
   auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
   auth.set_access_token(access_token, access_token_secret)

   stream = Stream(auth, listener)
   stream.filter(follow=[38744894], track=['#WorldCup'])

Because this is a hot hashtag right now, searches don't take too long to catch the maximum amount of tweets that Tweepy lets you get in one transaction. However, if I was going to search on #StackOverflow, it might be much slower, and therefore, I'd like a way to kill the stream. I could do this on several parameters, such as stopping after 100 tweets, stopping after 3 minutes, after a text output file has reached 150 lines, etc. I do know that the socket timeout time isn't used to achieve this.

I have taken a look at this similar question:

Tweepy Streaming - Stop collecting tweets at x amount

However, it appears to not use the streaming API. The data that it collects is also very messy, whereas this text output is clean.

Can anyone suggest a way to stop Tweepy (when using the stream in this method), based on some user input parameter, besides a keyboard interrupt?

Thanks

hb20007
  • 515
  • 1
  • 9
  • 23
sup bro
  • 301
  • 1
  • 9
  • 19

2 Answers2

3

I solved this, so I'm going to be one of those internet heroes that answers their own question.

This is achieved by using static Python variables for the counter and for the stop value (e.g. stop after you grab 20 tweets). This is currently a geolocation search, but you could easily swap it for a hashtag search by using the getTweetsByHashtag() method.

#!/usr/bin/env python
from tweepy import (Stream, OAuthHandler)
from tweepy.streaming import StreamListener

class Listener(StreamListener):

    tweet_counter = 0 # Static variable

    def login(self):
        CONSUMER_KEY =
        CONSUMER_SECRET =
        ACCESS_TOKEN =
        ACCESS_TOKEN_SECRET =

        auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
        auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
        return auth

    def on_status(self, status):
        Listener.tweet_counter += 1
        print(str(Listener.tweet_counter) + '. Screen name = "%s" Tweet = "%s"'
              %(status.author.screen_name, status.text.replace('\n', ' ')))

        if Listener.tweet_counter < Listener.stop_at:
            return True
        else:
            print('Max num reached = ' + str(Listener.tweet_counter))
            return False

    def getTweetsByGPS(self, stop_at_number, latitude_start, longitude_start, latitude_finish, longitude_finish):
        try:
            Listener.stop_at = stop_at_number # Create static variable
            auth = self.login()
            streaming_api = Stream(auth, Listener(), timeout=60) # Socket timeout value
            streaming_api.filter(follow=None, locations=[latitude_start, longitude_start, latitude_finish, longitude_finish])
        except KeyboardInterrupt:
            print('Got keyboard interrupt')

    def getTweetsByHashtag(self, stop_at_number, hashtag):
        try:
            Listener.stopAt = stop_at_number
            auth = self.login()
            streaming_api = Stream(auth, Listener(), timeout=60)
            # Atlanta area.
            streaming_api.filter(track=[hashtag])
        except KeyboardInterrupt:
            print('Got keyboard interrupt')

listener = Listener()
listener.getTweetsByGPS(20, -84.395198, 33.746876, -84.385585, 33.841601) # Atlanta area.
hb20007
  • 515
  • 1
  • 9
  • 23
sup bro
  • 301
  • 1
  • 9
  • 19
1

The above solution was helpful in getting tweets by hashtag, even though there is a small error while defining the getTweetByHashtag function. YOu had used Listener.stopAt instead of Listener.stop_at=stop_at_number.

I have tweaked the code a little bit, so you can easily kill the code for a specified number of seconds.

defined new functions init to help tweak the seconds and "on_data" which contains more information that on_status function.

Enjoy:

from tweepy import (Stream, OAuthHandler)
from tweepy.streaming import StreamListener

class Listener(StreamListener):

    tweet_counter = 0 # Static variable

    def login(self):
        CONSUMER_KEY =
        CONSUMER_SECRET =
        ACCESS_TOKEN =
        ACCESS_TOKEN_SECRET =

        auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
        auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
        return auth

    def __init__(self, time_limit=8):
        self.start_time = time.time()
        self.limit = time_limit
        super(Listener, self).__init__()

    def on_data(self, data):
        Listener.tweet_counter += 1
        if (time.time() - self.start_time) < self.limit and Listener.tweet_counter < Listener.stop_at:
            print(str(Listener.tweet_counter)+data)
            return True
        else:
            print("Either Max number reached or time limit up at:"+ str(Listener.tweet_counter)+" outputs")
            self.saveFile.close()
            return False

    #def on_status(self, status):
        #Listener.tweet_counter += 1
        #print(str(Listener.tweet_counter) + '. Screen name = "%s" Tweet = "%s"'
              #%(status.author.screen_name, status.text.replace('\n', ' ')))

        #if Listener.tweet_counter < Listener.stop_at and (time.time() - self.start_time) < self.limit:
            #return True
        
        #else:
            #print('Max num reached or time elapsed= ' + str(Listener.tweet_counter))
            #return False

    def getTweetsByGPS(self, stop_at_number, latitude_start, longitude_start, latitude_finish, longitude_finish):
        try:
            Listener.stop_at = stop_at_number # Create static variable
            auth = self.login()
            streaming_api = Stream(auth, Listener(), timeout=60) # Socket timeout value
            streaming_api.filter(follow=None, locations=[latitude_start, longitude_start, latitude_finish, longitude_finish])
        except KeyboardInterrupt:
            print('Got keyboard interrupt')

    def getTweetsByHashtag(self, stop_at_number, hashtag):
        try:
            Listener.stop_at = stop_at_number
            auth = self.login()
            streaming_api = Stream(auth, Listener(), timeout=60)
            # Atlanta area.
            streaming_api.filter(track=[hashtag])
        except KeyboardInterrupt:
            print('Got keyboard interrupt')
   

    listener = Listener()
    #listener.getTweetsByGPS(20, -84.395198, 33.746876, -84.385585, 33.841601) # Atlanta area.
    listener.getTweetsByHashtag(1000,"hi")

You can change the 1000 value to the max tweets you want and the "hi" to the keyword you need find.. Under the init function, change the 8 time_limit to the value you want in seconds. So you use it depending on what you want.

You can either set limited time and adjust the count to a very high value, or set the count of tweets needed and give a higher time value, so it can get to the count. Your choice! Chukwu Gozie unu (God bless!)

Nigel Iyke
  • 21
  • 3