0

twitter crawling _ python.

I used a " time.sleep ", but I get an error.

tweepy.error.TweepError: Twitter error response: status code = 429

What should I do?

import tweepy
import time
import os

search_term = 'word1'
search_term2= 'word2'
search_term3='word3'

lat = "37.6"
lon = "127.0"
radius = "100km"


API_key = "44"
API_secret = "33"
Access_token = "22"
Access_token_secret = "11"

location = "%s,%s,%s" % (lat, lon, radius)

auth = tweepy.OAuthHandler(API_key, API_secret)
auth.set_access_token(Access_token, Access_token_secret)

api = tweepy.API(auth)

c=tweepy.Cursor(api.search,
                q="{}+OR+{}".format(search_term, search_term2, search_term3),
                rpp=1000,
                geocode=location,
                include_entities=True)

data = {}
i = 1
for tweet in c.items():
    data['text'] = tweet.text
    print(i, ":", data)
    i += 1
time.sleep(300)

I have additional questions

The following is a code that stores the output results as a txt file

Does this code require " time.sleep "?

wfile = open(os.getcwd()+"/workk2.txt", mode='w')   
data = {}   
i = 0       

for tweet in c.items():
    data['text'] = tweet.text   
    wfile.write(data['text']+'\n')  
    i += 1
time.sleep(300)

wfile.close()

1 Answers1

1

A 429 means that you have sent too many requests. My guess is that because you are using for tweet in c.items(): and therefore not limiting the number of requests being sent, you are being rate limited by the Twitter API (that's what the error response is).

If you don't need to find an unlimited amount of tweets, you could set a max e.g. c.items(200).

Your time.sleep is outside the loop and therefore doesn't create a pause between each tweet request. You would want

for tweet in c.items():
    data['text'] = tweet.text   
    wfile.write(data['text']+'\n')  
    i += 1
    time.sleep(5)

That's a 5 second pause, which might solve your rate limiting issues. See also http://docs.tweepy.org/en/v3.5.0/cursor_tutorial.html#

jmk
  • 466
  • 1
  • 4
  • 21
  • It's more that you have an hourly limit of 150 (unauthenticated) or 350 (authenticated) rather than 1 per 5 seconds etc. So that's why using a limit argument on `items()` might be the better approach. See the [Twitter API](https://support.twitter.com/articles/160385) and [this question](http://stackoverflow.com/questions/21308762/avoid-twitter-api-limitation-with-tweepy) – jmk Apr 17 '17 at 12:04