0

I need to obtain all the followers of a Twitter account that has aprox 125K follores. So I run this code:

import tweepy
auth = tweepy.OAuth2AppHandler(api_key, api_secret)
api = tweepy.API(auth)
tweepy.Cursor(api.get_followers,screen_name=sN,count=100).items(125000)

Credentials are under a Development App on an Elevated Developer Account.

And I got this error:

TooManyRequests: 429 Too Many Requests 88 - Rate limit exceeded

Is there a paginator I can uset to request lest items and obtain the 125000 followers? How can I complement this code with Cursor pages?

Thanks!

On 04/22/2023 I run this:

auth = tweepy.OAuth1UserHandler(
   trikini.api_key, trikini.api_secret

)

api = tweepy.API(auth, wait_on_rate_limit=True)

first_net = []
for status in tweepy.Cursor(api.get_followers, screen_name=sN,
                            count=200).items():
    print(status.id)
    first_net.append(status.id
                      #status.screen_name]
                      )

And got this error: Unauthorized: 401 Unauthorized Not authorized.

Then I tried this:

import tweepy

auth = tweepy.OAuth1UserHandler(
        consumer_key, consumer_secret, 
        access_token, access_token_secret
)

api = tweepy.API(auth, wait_on_rate_limit=True)

first_net = []
for status in tweepy.Cursor(api.get_followers, screen_name=sN,
                            count = 200).items(125000):
    print(status.screen_name)
    ids.append([status.id,status.screen_name])
    with open(r'filename.txt', 'w') as fp:
        for item in ids:
            fp.write("%s\n" % item)
first_net

The code ended its execution, but I just got 252 IDs, and the user masked with sN had 112565 followers. What may had happened?

anitasp
  • 577
  • 4
  • 13
  • 35
  • Great, I updated what I am trying, I am not getting more than 252 IDs and that user has 112 thousand followers. Could you help me to figure out what I might change to get all the follower IDs? – anitasp Apr 22 '23 at 20:30
  • I added a new section within my answer, based on your last question update. LMK if that helps – Life is complex Apr 22 '23 at 21:22
  • Still getting just 252 ids – anitasp Apr 22 '23 at 21:42
  • I get around 1500 users before I hit the rate limit threshold. But it seems that there might be a bug in `tweepy` that doesn't maintain the socket when the rate limit threshold is reset. It looks to be in the `function request` in the main api for `tweepy` I'm doing some more testing to determine how to prevent this bug from throwing an exception. – Life is complex Apr 23 '23 at 15:36
  • Thank you! Do you recommend doing this with another library or language? – anitasp Apr 24 '23 at 16:18
  • `tweepy` has an open bug on `wait_on_rate_limit`. I just posted about this in my answer. So using `tweepy` won't work. You should be able to call the API directly and wrap some throttling code around the request. This will require some testing to get correct. – Life is complex Apr 24 '23 at 16:35
  • There are several python packages on this [page](https://developer.twitter.com/en/docs/twitter-api/tools-and-libraries/v2) that can query `Twitter.` I'm not sure if any of them have throttling code added. – Life is complex Apr 25 '23 at 12:31
  • I'm testing one these packages, which has throttling code added. I open a bug report on the package, because the pagination_token is missing, which is important to query the next batch of users. – Life is complex Apr 25 '23 at 14:22
  • I found out how to resolve the issue in the new package. So far everything works, but I need to do more testing. I was able to get 50,000 users with zero problems. One issue that I need to fix is or the package owner needs to fix is feedback when the rate limit is hit. I had to edit their code to get this feedback. – Life is complex Apr 26 '23 at 14:11
  • This package works, [python-twitter](https://github.com/sns-sdks/python-twitter). I have a pull request open to resolve something that needs to be improved. I was able to pull the 185K followers for a twitter account that I follow. Good Luck. – Life is complex May 01 '23 at 16:12

1 Answers1

4

The error TooManyRequests: 429 Too Many Requests 88 - Rate limit exceeded is being thrown, because you exceeded the standard rate limit.

Check out the Twitter API rate limits, which are the same for tweepy.

Standard rate limits:

The maximum number of requests that are allowed is based on a time interval, some specified period or window of time. The most common request limit interval is fifteen minutes. If an endpoint has a rate limit of 900 requests/15-minutes, then up to 900 requests over any 15-minute interval is allowed.

There is a parameter (wait_on_rate_limit) that you can use, which will mitigate the error. This parameter will put your query session into a sleep mode once you hit the rate limit threshold. The parameter is designed to put up the session once the rate limit threshold has restarted.

Here is how it is used. The reference below is from the code base.

# Setting wait_on_rate_limit to True when initializing API will initialize 
# an instance, called api here, that will automatically wait, using time.sleep, 
# for the appropriate amount of time when a rate limit is #encountered
api = tweepy.API(auth, wait_on_rate_limit=True)

Here is another example reference from the tweepy.API and the code below is from that reference:

import tweepy


consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""

auth = tweepy.OAuth1UserHandler(
    consumer_key, consumer_secret, access_token, access_token_secret
)

# Setting wait_on_rate_limit to True when initializing API will initialize an
# instance, called api here, that will automatically wait, using time.sleep,
# for the appropriate amount of time when a rate limit is encountered
api = tweepy.API(auth, wait_on_rate_limit=True)

# This will search for Tweets with the query "Twitter", returning up to the
# maximum of 100 Tweets per request to the Twitter API

# Once the rate limit is reached, it will automatically wait / sleep before
# continuing

for tweet in tweepy.Cursor(api.search_tweets, "Twitter", count=100).items():
    print(tweet.id)

UPDATED 04.24.2023

After doing more research into this question, I found that tweepy has a bug in the code base that doesn't maintain the state of a session when using the parameter wait_on_rate_limit with either Twitter's API v1.1 or v2.0

In API v1.1 and API v2.0 the bug is in the function request in this code. The bug in API v2.0 is linked to requests.sessions.

There is an open tweepy issue on this bug.

Both the code examples below for me got 1000s of users before the rate limit threshold was triggered.

Here is the code that I used for API v1.1:

import tweepy
import requests

auth = tweepy.OAuth1UserHandler(
        consumer_key, consumer_secret,
        access_token, access_token_secret
)


api = tweepy.API(auth, wait_on_rate_limit=True)

user = api.get_user(screen_name="target_user_screen_name")
followers_count = user.followers_count
try:
    for query_response in tweepy.Cursor(api.get_followers,
                                user_id = user.id,
                                screen_name = user.screen_name,
                                count = 200).items(followers_count):
        print(query_response.screen_name)
        print(query_response.id)

except requests.exceptions.ReadTimeout:
   pass
except requests.exceptions.Timeout:
   pass
except tweepy.errors.TweepyException as e:
    pass

Here is the code that I used for API v2.0:

import tweepy


def create_session(token):
    tweepy_client = tweepy.Client(bearer_token=token, wait_on_rate_limit=True)
    return tweepy_client

def query_user_followers(user_id, next_token, tweepy_client):
    if len(next_token) == 0:
        query_response = tweepy_client.get_users_followers(id=user_id,
                                                           max_results=1,
                                                           user_fields=['id', 'name', 'username'],
                                                           pagination_token = None)
        return query_response
    elif len(next_token) > 0:
        query_response = tweepy_client.get_users_followers(id=user_id,
                                                           max_results=1,
                                                           user_fields=['id', 'name', 'username'],
                                                           pagination_token = next_token)
        return query_response


tweepy_data = []
tweepy_session = create_session(bearer_token)
initial_query = query_user_followers('target_user_id', '', tweepy_session)
tweepy_data.append(initial_query)
next_token = initial_query.meta['next_token']
while True:
    try:
        next_query = query_user_followers('target_user_id', next_token, tweepy_session)
        tweepy_data.append(next_query)
        next_token = next_query.meta['next_token']
    except requests.exceptions.ReadTimeout:
        continue
    except requests.exceptions.Timeout:
        continue
    except tweepy.errors.TweepyException as e:
        continue

Here is some useful information on handling disconnections with the Twitter API.

Life is complex
  • 15,374
  • 5
  • 29
  • 58