Stopping Tweepy stream after a duration parameter (# lines, seconds, #Tweets, etc)

Question

I am using Tweepy to capture streaming tweets based off of the hashtag #WorldCup, as seen by the code below. It works as expected.

class StdOutListener(StreamListener):
  ''' Handles data received from the stream. '''

  def on_status(self, status):
      # Prints the text of the tweet
      print('Tweet text: ' + status.text)

      # There are many options in the status object,
      # hashtags can be very easily accessed.
      for hashtag in status.entries['hashtags']:
          print(hashtag['text'])

      return true

    def on_error(self, status_code):
        print('Got an error with status code: ' + str(status_code))
        return True # To continue listening

    def on_timeout(self):
        print('Timeout...')
        return True # To continue listening

if __name__ == '__main__':
   listener = StdOutListener()
   auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
   auth.set_access_token(access_token, access_token_secret)

   stream = Stream(auth, listener)
   stream.filter(follow=[38744894], track=['#WorldCup'])

Because this is a hot hashtag right now, searches don't take too long to catch the maximum amount of tweets that Tweepy lets you get in one transaction. However, if I was going to search on #StackOverflow, it might be much slower, and therefore, I'd like a way to kill the stream. I could do this on several parameters, such as stopping after 100 tweets, stopping after 3 minutes, after a text output file has reached 150 lines, etc. I do know that the socket timeout time isn't used to achieve this.

I have taken a look at this similar question:

Tweepy Streaming - Stop collecting tweets at x amount

However, it appears to not use the streaming API. The data that it collects is also very messy, whereas this text output is clean.

Can anyone suggest a way to stop Tweepy (when using the stream in this method), based on some user input parameter, besides a keyboard interrupt?

Thanks

I have same problem with TwitterAPI library for python – lenhhoxung Nov 02 '15 at 09:52 — lenhhoxung, Nov 02 '15 at 09:52

score 3 · Answer 1 · edited Mar 22 '18 at 05:09

I solved this, so I'm going to be one of those internet heroes that answers their own question.

This is achieved by using static Python variables for the counter and for the stop value (e.g. stop after you grab 20 tweets). This is currently a geolocation search, but you could easily swap it for a hashtag search by using the getTweetsByHashtag() method.

#!/usr/bin/env python
from tweepy import (Stream, OAuthHandler)
from tweepy.streaming import StreamListener

class Listener(StreamListener):

    tweet_counter = 0 # Static variable

    def login(self):
        CONSUMER_KEY =
        CONSUMER_SECRET =
        ACCESS_TOKEN =
        ACCESS_TOKEN_SECRET =

        auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
        auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
        return auth

    def on_status(self, status):
        Listener.tweet_counter += 1
        print(str(Listener.tweet_counter) + '. Screen name = "%s" Tweet = "%s"'
              %(status.author.screen_name, status.text.replace('\n', ' ')))

        if Listener.tweet_counter < Listener.stop_at:
            return True
        else:
            print('Max num reached = ' + str(Listener.tweet_counter))
            return False

    def getTweetsByGPS(self, stop_at_number, latitude_start, longitude_start, latitude_finish, longitude_finish):
        try:
            Listener.stop_at = stop_at_number # Create static variable
            auth = self.login()
            streaming_api = Stream(auth, Listener(), timeout=60) # Socket timeout value
            streaming_api.filter(follow=None, locations=[latitude_start, longitude_start, latitude_finish, longitude_finish])
        except KeyboardInterrupt:
            print('Got keyboard interrupt')

    def getTweetsByHashtag(self, stop_at_number, hashtag):
        try:
            Listener.stopAt = stop_at_number
            auth = self.login()
            streaming_api = Stream(auth, Listener(), timeout=60)
            # Atlanta area.
            streaming_api.filter(track=[hashtag])
        except KeyboardInterrupt:
            print('Got keyboard interrupt')

listener = Listener()
listener.getTweetsByGPS(20, -84.395198, 33.746876, -84.385585, 33.841601) # Atlanta area.

What about stopping upon timeout? You don't mention that on your answer. I don't think your answer to your own question should be accepted :P — dekaru, Sep 13 '15 at 07:50
If you use Twython, there is a handy method named `disconnect` — lenhhoxung, Nov 02 '15 at 10:56
@hb20007: Please post a new answer if you feel your solution is better. — Cris Luengo, Mar 21 '18 at 23:12

score 1 · Answer 2 · answered Mar 24 '21 at 16:59

The above solution was helpful in getting tweets by hashtag, even though there is a small error while defining the getTweetByHashtag function. YOu had used Listener.stopAt instead of Listener.stop_at=stop_at_number.

I have tweaked the code a little bit, so you can easily kill the code for a specified number of seconds.

defined new functions init to help tweak the seconds and "on_data" which contains more information that on_status function.

Enjoy:

from tweepy import (Stream, OAuthHandler)
from tweepy.streaming import StreamListener

class Listener(StreamListener):

    tweet_counter = 0 # Static variable

    def login(self):
        CONSUMER_KEY =
        CONSUMER_SECRET =
        ACCESS_TOKEN =
        ACCESS_TOKEN_SECRET =

        auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
        auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
        return auth

    def __init__(self, time_limit=8):
        self.start_time = time.time()
        self.limit = time_limit
        super(Listener, self).__init__()

    def on_data(self, data):
        Listener.tweet_counter += 1
        if (time.time() - self.start_time) < self.limit and Listener.tweet_counter < Listener.stop_at:
            print(str(Listener.tweet_counter)+data)
            return True
        else:
            print("Either Max number reached or time limit up at:"+ str(Listener.tweet_counter)+" outputs")
            self.saveFile.close()
            return False

    #def on_status(self, status):
        #Listener.tweet_counter += 1
        #print(str(Listener.tweet_counter) + '. Screen name = "%s" Tweet = "%s"'
              #%(status.author.screen_name, status.text.replace('\n', ' ')))

        #if Listener.tweet_counter < Listener.stop_at and (time.time() - self.start_time) < self.limit:
            #return True
        
        #else:
            #print('Max num reached or time elapsed= ' + str(Listener.tweet_counter))
            #return False

    def getTweetsByGPS(self, stop_at_number, latitude_start, longitude_start, latitude_finish, longitude_finish):
        try:
            Listener.stop_at = stop_at_number # Create static variable
            auth = self.login()
            streaming_api = Stream(auth, Listener(), timeout=60) # Socket timeout value
            streaming_api.filter(follow=None, locations=[latitude_start, longitude_start, latitude_finish, longitude_finish])
        except KeyboardInterrupt:
            print('Got keyboard interrupt')

    def getTweetsByHashtag(self, stop_at_number, hashtag):
        try:
            Listener.stop_at = stop_at_number
            auth = self.login()
            streaming_api = Stream(auth, Listener(), timeout=60)
            # Atlanta area.
            streaming_api.filter(track=[hashtag])
        except KeyboardInterrupt:
            print('Got keyboard interrupt')
   

    listener = Listener()
    #listener.getTweetsByGPS(20, -84.395198, 33.746876, -84.385585, 33.841601) # Atlanta area.
    listener.getTweetsByHashtag(1000,"hi")

You can change the 1000 value to the max tweets you want and the "hi" to the keyword you need find.. Under the init function, change the 8 time_limit to the value you want in seconds. So you use it depending on what you want.

You can either set limited time and adjust the count to a very high value, or set the count of tweets needed and give a higher time value, so it can get to the count. Your choice! Chukwu Gozie unu (God bless!)

Stopping Tweepy stream after a duration parameter (# lines, seconds, #Tweets, etc)

2 Answers2

Linked