0

I'm trying to download some twitter data for the Chicago area specifically focussing on crime-related tweets. I need these also to be geotagged with co-ordinates. I'd like to get a good amount for analysis purposes however the REST API is limited, and therefore restricting this to a fairly low number. I've been trying to produce a workaround solution to this, based on a similar question Avoid twitter api limitation with Tweepy however thus far I'm not having much luck. Could anyone help me with this? I'm a newbie to all of this sort of stuff so any help would be really appreciated. Ideally I want this in a pandas dataframe as well. I've been using the following tutorial as a basis for my coding. This can be found at: http://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively./ I've copied the code I have below:

import tweepy
auth = tweepy.AppAuthHandler('', '')
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
if (not api):
print ("Can't Authenticate")
sys.exit(-1)

import sys
import jsonpickle
import os



searchQuery = 'shooting OR stabbing OR violence OR assualt OR attack OR homicide OR punched OR mugging OR murder'
geocode= "41.8781,-87.6298,15km"


maxTweets = 1000000
tweetsPerQry = 100
fName = 'tweets.txt'
sinceId = None
max_id = 1L
tweetCount = 0
print ("Downloading max {0} tweets".format(maxTweets))
with open (fName, 'w') as f:
  while tweetCount < maxTweets:
    try:
        if (max_id <= 0):
            if(not sinceId):
                new_tweets = api.search(q=searchQuery, geocode=geocode, count=tweetsPerQry)
            else:
                new_tweets = api.search(q=searchQuery, geocode=geocode, count=tweetsPerQry, since_id=sinceID)
        else:
            if (not sinceId):
                new_tweets = api.search(q=searchQuery, geocode=geocode, count=tweetsPerQry, max_id=str(max_id-1))
            else:
                new_tweets = api.search(q=searchQuery, geocode=geocode, count=tweetsPerQry, max_id=str(max_id-1), since_id=sinceId)
        if not new_tweets:
            print ("No more tweets found")
            break
        for tweet in new_tweets:
            f.write(jsonpickle.encode(tweet._json, unpicklable=False)+'\n')
        tweetCount += len(new_tweets)
        print("Downloaded {0} tweets".format(tweetCount))
        max_id = new_tweets[-1].id
    except tweepy.TweepError as e:
        print("some error : " + str(e))
        break
print ("Downloaded {0} tweets, Saved to {1}".format(tweetCount, fName))
Community
  • 1
  • 1
  • What precisely does *"not having much luck"* mean, in this case? Errors? Unexpected behaviour? Please give a [mcve] (and try not to share your API tokens in the future). – jonrsharpe May 08 '16 at 10:17
  • Thanks for responding so quickly! sorry that was an oversight on my part. Thankyou for removing this. In terms of not much luck, it just seems to hang as though it is processing, however when I check my text file there is nothing in it, whereas I'd expect this to at least have some data in it after running for a while. – Michael Montgomery May 08 '16 at 10:51
  • I don't get any error messages just to clarify – Michael Montgomery May 08 '16 at 10:57
  • In addition this may help, I've been using the following tutorial as a basis for this code: – Michael Montgomery May 08 '16 at 11:14
  • http://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively./ – Michael Montgomery May 08 '16 at 11:15
  • Please [edit] the question to include all relevant information. Have you considered running it on a smaller number (1?) to test the fundamentals? – jonrsharpe May 08 '16 at 11:16
  • Downloading max 1000000 tweets --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in () 34 break 35 for tweet in new_tweets: ---> 36 f.write(jsonpickle.encode(tweet._json, unpickable=False)+'\n') 37 tweetCount += len(new_tweets) 38 print("Downloaded {0} tweets".format(tweetCount)) TypeError: encode() got an unexpected keyword argument 'unpickable' – Michael Montgomery May 08 '16 at 12:01
  • As far as I can tell the only difference between my code and the code in the tutorial example is that I am searching on multiple keywords and with tweets around a location in Chicago. Could this be why I am getting this error message? Thankyou for your help it's much appreciated! – Michael Montgomery May 08 '16 at 12:03
  • This now works as I noticed the error for the unpicklable argument. Thanks for help – Michael Montgomery May 08 '16 at 16:55

1 Answers1

0

After running into the same problem I created a method of identifying impending API rate limits. This python code using tweepy, it will print the number of API requests made and the # of permitted requests remaining. You can add your own code to delay/sleep/wait either before or after limits are reached or use the tweepy wait_on_rate_limit (more details HERE).

Example output:

Twitter API: 3 requests used, 177 remaining, for API queries to /search/tweets

Twitter API: 3 requests used, 177 remaining, for API queries to /application/rate_limit_status

api = tweepy.API(auth)


#Twitter's words on API limits https://support.twitter.com/articles/15364

#### Define twitter rate determining loop
def twitter_rates():
    stats = api.rate_limit_status()  #stats['resources'].keys()
    for akey in stats['resources'].keys():
        if type(stats['resources'][akey]) == dict:
            for anotherkey in stats['resources'][akey].keys():
                if type(stats['resources'][akey][anotherkey]) == dict:
                    #print(akey, anotherkey, stats['resources'][akey][anotherkey])
                    limit = (stats['resources'][akey][anotherkey]['limit'])
                    remaining = (stats['resources'][akey][anotherkey]['remaining'])
                    used = limit - remaining
                    if used != 0:
                        print("Twitter API used", used, "remaining queries", remaining,"for query type", anotherkey)
                    else:
                        pass
                else:
                    pass  #print("Passing")  #stats['resources'][akey]
        else:
            print(akey, stats['resources'][akey])
            print(stats['resources'][akey].keys())
            limit = (stats['resources'][akey]['limit'])
            remaining = (stats['resources'][akey]['remaining'])
            used = limit - remaining
            if used != 0:
                print("Twitter API:", used, "requests used,", remaining, "remaining, for API queries to", akey)
                pass


twitter_rates()

Also note that wait_on_rate_limit "will stop the exceptions. Tweepy will sleep for however long is needed for the rate limit to replenish." Aaron Hill Jul 2014, HERE is a Stackoverflow page with more comments on this.

Community
  • 1
  • 1
Antoine
  • 33
  • 6