3

I have the following code

api = tweepy.API(auth,wait_on_rate_limit=True)
for tweet in tweepy.Cursor(api.search,
                            tweet_mode="extended",
                            q=query + " exclude:retweets").items(11000):
    hashtags = "#" + " #".join([hashtag['text'] for hashtag in tweet.entities.get('hashtags')])
    print(i)

    if tweet.place:
        tweet_place = tweet.place.full_name + ', ' + tweet.place.country_code
    else:
        tweet_place = "Not Geo-tagged"
    i += 1
    
    csvWriter.writerow([tweet.id, tweet.full_text.encode('utf-8'), tweet.created_at, tweet.lang, tweet.retweet_count, tweet.favorite_count, tweet_place, tweet.user.id, tweet.user.screen_name, tweet.user.followers_count, tweet.user.friends_count, tweet.user.created_at, tweet.user.favourites_count, tweet.user.statuses_count, tweet.user.lang, tweet.user.verified, tweet.user.location])

I am trying to get 11000 tweets with a specific search query but after some time it throws the following error:

Traceback (most recent call last):
  .............
ConnectionResetError: [Errno 54] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  .............
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  .............
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  .............
tweepy.error.TweepError: Failed to send request: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

Earlier it was happening around 2500 tweets but on changing the query it started happening at around 5000 tweets. Any idea what can be wrong and how I can fix it?

Eagle
  • 318
  • 4
  • 16

1 Answers1

2

its most likely because you have exceeded the allowed amount of tweets that you can pull at once / per 15 minutes

check here for more information.

  • Yes but i think it paused for that time and resumed again. At around 2500 tweets it paused for quite some time and then resumed maybe because of `wait_on_rate_limit=True` But still it was strange as this error appeared only around 2500 and 5000 tweets – Eagle Dec 18 '20 at 02:49
  • try fetching like 500 tweets multiple times instead of grabbing all 10k tweets at once – Moe AbuShaqra Dec 18 '20 at 02:55
  • but that way it is possible that I get repeated tweets. Or is there any way you'd suggest to get unique tweets in each cycle of 500? – Eagle Dec 18 '20 at 03:12
  • What is your expectation of the Twitter API here? You can make repeated requests (within a rate limit) and pages of up to 100 requests at a time. Why do you think that you can request 11000 Tweets in a single piece of code? – Andy Piper Dec 18 '20 at 03:25
  • 1
    According to what I know, if I make repeated requests it is highly likely that I get repeated tweets, whereas I wanted 11000 unique tweets. Just getting all the tweets different is my main concern – Eagle Dec 18 '20 at 03:29
  • you could get the id of the last tweet you got in every iteration and start from that id , i know it works with user time line but idk about Cursor – Moe AbuShaqra Dec 18 '20 at 03:43
  • @MoeAbuShaqra could you elaborate a bit more? as my search query is something like `#hashtag exclude:retweets` so not sure how this can be done – Eagle Dec 18 '20 at 03:49
  • you could do something similar to https://gist.github.com/yanofsky/5436496 , but since this goes through tweets in someones timeline , they are ordered. – Moe AbuShaqra Dec 18 '20 at 03:55