6

I'm looking to have the Tweepy Streaming API stop pulling in tweets after I have stored x # of tweets in MongoDB.

I have tried IF and WHILE statements inside the class, defintion with counters, but cannot get it to stop at a certain X amount. This is a real head-banger for me. I found this link here: https://groups.google.com/forum/#!topic/tweepy/5IGlu2Qiug4 but my efforts to replicate this have failed. It always tells me that init needs an additional argument. I believe we have our Tweepy auth set different, so it is not apples to apples.

Any thoughts?

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json, time, sys

import tweepy
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET)

class StdOutListener(StreamListener):

    def on_status(self, status):
        text = status.text
        created = status.created_at
        record = {'Text': text, 'Created At': created}
        print record #See Tweepy documentation to learn how to access other fields
        collection.insert(record)  


    def on_error(self, status):
        print 'Error on status', status

    def on_limit(self, status):
        print 'Limit threshold exceeded', status

    def on_timeout(self, status):
        print 'Stream disconnected; continuing...'


stream = Stream(auth, StdOutListener())
stream.filter(track=['tv'])
AngryWhopper
  • 393
  • 3
  • 4
  • 16

1 Answers1

12

You need to add a counter inside of your class in __init__, and then increment it inside of on_status. Then when the counter is below 20 it will insert a record into the collection. This could be done as show below:

def __init__(self, api=None):
    super(StdOutListener, self).__init__()
    self.num_tweets = 0

def on_status(self, status):
    record = {'Text': status.text, 'Created At': status.created_at}
    print record #See Tweepy documentation to learn how to access other fields
    self.num_tweets += 1
    if self.num_tweets < 20:
        collection.insert(record)
        return True
    else:
        return False
Nat Dempkowski
  • 2,331
  • 1
  • 19
  • 36
  • 1
    Adding the __init__ gives me this error: "'StdOutListener' object has no attribute 'api'" http://i.imgur.com/Z2N3hCB.png I am not sure what adding that has to do with the api? – AngryWhopper Jan 01 '14 at 20:35
  • Sorry about that, you also need to add a call to the init of the base class. I updated the code about, but it is as simple as adding a line `super(StdOutListener, self).__init__()` to the definition of init. – Nat Dempkowski Jan 02 '14 at 19:24
  • 1
    To reduce errors in future, I'd better conform `__init__` definition with StreamListener's one: `def __init__(self, api=None):`, and call it whith api param. – alko Jan 02 '14 at 19:32
  • Thanks, this worked! So for my understanding, why did this need a call back to the base class init? When I don't call it, but added api=None, it gives the "no attribute api" error. Is the purpose of the super init to call back to the base class that DOES have an api attribute? – AngryWhopper Jan 05 '14 at 02:20
  • In tweepy I get this error `NameError: global name 'StdOutListener' is not defined` how should I use count in init? – Mona Jalal Jul 01 '16 at 04:06
  • That line is just trying to call super on the class you're initializing. You can change `StdOutListener` to whatever you're calling your `StreamListener` subclass. eg. the code from my response should go inside the `StdOutListener` class. – Nat Dempkowski Jul 01 '16 at 05:02
  • @AngryWhopper I've been dealing with this error for a few hours now and wouldn't have thought the `__init()__` method would have caused it! +1 for catching that – Hamman Samuel Jul 14 '16 at 09:28