37

I saw in some question on Stack Exchange that the limitation can be a function of the number of requests per 15 minutes and depends also on the complexity of the algorithm, except that this is not a complex one.

So I use this code:

import tweepy
import sqlite3
import time

db = sqlite3.connect('data/MyDB.db')

# Get a cursor object
cursor = db.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS MyTable(id INTEGER PRIMARY KEY, name TEXT, geo TEXT, image TEXT, source TEXT, timestamp TEXT, text TEXT, rt INTEGER)''')
db.commit()

consumer_key = ""
consumer_secret = ""
key = ""
secret = ""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(key, secret)

api = tweepy.API(auth)

search = "#MyHashtag"

for tweet in tweepy.Cursor(api.search,
                           q=search,
                           include_entities=True).items():
    while True:
        try:
            cursor.execute('''INSERT INTO MyTable(name, geo, image, source, timestamp, text, rt) VALUES(?,?,?,?,?,?,?)''',(tweet.user.screen_name, str(tweet.geo), tweet.user.profile_image_url, tweet.source, tweet.created_at, tweet.text, tweet.retweet_count))
        except tweepy.TweepError:
                time.sleep(60 * 15)
                continue
        break
db.commit()
db.close()

I always get the Twitter limitation error:

Traceback (most recent call last):
  File "stream.py", line 25, in <module>
    include_entities=True).items():
  File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 153, in next
    self.current_page = self.page_iterator.next()
  File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 98, in next
    data = self.method(max_id = max_id, *self.args, **self.kargs)
  File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 200, in _call
    return method.execute()
  File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 176, in execute
    raise TweepError(error_msg, resp)
tweepy.error.TweepError: [{'message': 'Rate limit exceeded', 'code': 88}]
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
4m1nh4j1
  • 4,289
  • 16
  • 62
  • 104

6 Answers6

92

For anyone who stumbles upon this on Google, tweepy 3.2+ has additional parameters for the tweepy.api class, in particular:

  • wait_on_rate_limit – Whether or not to automatically wait for rate limits to replenish
  • wait_on_rate_limit_notify – Whether or not to print a notification when Tweepy is waiting for rate limits to replenish

Setting these flags to True will delegate the waiting to the API instance, which is good enough for most simple use cases.

dancow
  • 3,228
  • 2
  • 26
  • 28
  • I was looking at implement my own code to sleep between requests to the API until I saw this answer, very useful, totally agree this is the most pythonic answer – Daniel Vieira Aug 07 '18 at 10:48
  • 4
    This should be the accepted answer now, @4m1nh4j1. Also, your name is a pain to type. – wordsforthewise Nov 21 '18 at 19:31
  • 2
    with this method, will the Cursor object get different tweets after replenishing or is there a chance for it to get tweets that it got in the previous "iteration" before hitting rate limit? @dan-nguyen – Rahul Kothari Aug 04 '19 at 16:27
34

The problem is that your try: except: block is in the wrong place. Inserting data into the database will never raise a TweepError - it's iterating over Cursor.items() that will. I would suggest refactoring your code to call the next method of Cursor.items() in an infinite loop. That call should be placed in the try: except: block, as it can raise an error.

Here's (roughly) what the code should look like:

# above omitted for brevity
c = tweepy.Cursor(api.search,
                       q=search,
                       include_entities=True).items()
while True:
    try:
        tweet = c.next()
        # Insert into db
    except tweepy.TweepError:
        time.sleep(60 * 15)
        continue
    except StopIteration:
        break

This works because when Tweepy raises a TweepError, it hasn't updated any of the cursor data. The next time it makes the request, it will use the same parameters as the request which triggered the rate limit, effectively repeating it until it goes though.

Aaron Hill
  • 3,196
  • 1
  • 18
  • 34
  • 1
    Thank you @Aaron, does adding `monitor_rate_limit=True, wait_on_rate_limit=True` instead of catching exceptions works with tweepy ? – 4m1nh4j1 Jul 14 '14 at 09:11
  • 4
    `wait_on_rate_limit` will stop the exceptions. Tweepy will sleep for however long is needed for the rate limit to replenish. – Aaron Hill Jul 14 '14 at 21:12
  • @Aaron how do you implement wait_on_rate_limit in the aboce code? – jxn Oct 28 '14 at 22:11
  • 2
    @jenn: Pass it in as a keword argument when you create an `API~ instance. – Aaron Hill Oct 29 '14 at 10:22
  • 1
    The latest Tweepy version as of writing now includes the `RateLimitError` exception. Source: https://github.com/tweepy/tweepy/pull/611 – Hamman Samuel Jun 20 '15 at 12:51
  • 2
    Using `wait_on_rate_limit=True` is the correct way to do it. If you keep hitting rate limits and sleeping, Twitter will eventually blacklist your account. I've had it happen to me a bunch of times. – sudo Jan 15 '16 at 01:48
  • @sudo @aaron-hill with your methods (`wait_on_rate_limit` and catching exceptions respectively), will the `Cursor` object get different tweets or is there a chance for it to get tweets that it got in the previous "iteration" before hitting rate limit? – Rahul Kothari Aug 04 '19 at 16:26
  • @RahulKothari Whether or not you use `wait_on_rate_limit`, you'll get back some of the same tweets if you keep searching the same thing. You'll get back newer tweets as time passes, though. – sudo Aug 07 '19 at 02:30
24

Just replace

api = tweepy.API(auth)

with

api = tweepy.API(auth, wait_on_rate_limit=True)
Mayank Khullar
  • 441
  • 3
  • 3
19

If you want to avoid errors and respect the rate limit you can use the following function which takes your api object as an argument. It retrieves the number of remaining requests of the same type as the last request and waits until the rate limit has been reset if desired.

def test_rate_limit(api, wait=True, buffer=.1):
    """
    Tests whether the rate limit of the last request has been reached.
    :param api: The `tweepy` api instance.
    :param wait: A flag indicating whether to wait for the rate limit reset
                 if the rate limit has been reached.
    :param buffer: A buffer time in seconds that is added on to the waiting
                   time as an extra safety margin.
    :return: True if it is ok to proceed with the next request. False otherwise.
    """
    #Get the number of remaining requests
    remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
    #Check if we have reached the limit
    if remaining == 0:
        limit = int(api.last_response.getheader('x-rate-limit-limit'))
        reset = int(api.last_response.getheader('x-rate-limit-reset'))
        #Parse the UTC time
        reset = datetime.fromtimestamp(reset)
        #Let the user know we have reached the rate limit
        print "0 of {} requests remaining until {}.".format(limit, reset)

        if wait:
            #Determine the delay and sleep
            delay = (reset - datetime.now()).total_seconds() + buffer
            print "Sleeping for {}s...".format(delay)
            sleep(delay)
            #We have waited for the rate limit reset. OK to proceed.
            return True
        else:
            #We have reached the rate limit. The user needs to handle the rate limit manually.
            return False 

    #We have not reached the rate limit
    return True
Till Hoffmann
  • 9,479
  • 6
  • 46
  • 64
  • Thanks for this answer. Pretty helpful for dealing with another API and I wanted to respect the rate limit :) – Jimi Oke Jan 01 '17 at 19:46
  • 3
    Note that in the latest tweepy version `getheader()` function was replaced by a `headers` dict so `api.last_response.getheader('x-rate-limit-limit')` need to be replaced with `api.last_response.headers['x-rate-limit-remaining']` – xro7 Oct 23 '17 at 09:26
  • I would put this `delay = abs(reset - datetime.datetime.now()).total_seconds() + buffer`, since for some reason I had a negative value as `delay` value – salvob Jun 11 '18 at 10:36
7
import tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# will notify user on ratelimit and will wait by it self no need of sleep.
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
Malik Faiq
  • 433
  • 6
  • 18
  • 1
    Hi! Please follow up with some explanation as to how this code is a solution to the problem. Since this is an old question with quite a few answers, please elaborate on how this is a different solution to others posted. Thanks! -From Review. – d_kennetz Apr 23 '19 at 13:55
  • 1
    I just simply add the pythonic way to initiate tweepy API handling ratelimit. – Malik Faiq Apr 24 '19 at 08:56
0

I suggest you to use the new api v2 and use the Client obj with the flag wait_on_rate_limit=True the v1 will be deprecated asap

client = tweepy.Client(consumer_key=auth.consumer_key, consumer_secret=auth.consumer_secret, access_token_secret=auth.access_token_secret, access_token=auth.access_token,
                       bearer_token=twitter_bearer_token, wait_on_rate_limit=True)

It will be all automatic

badr
  • 94
  • 1
  • 5