I'm trying to download some twitter data for the Chicago area specifically focussing on crime-related tweets. I need these also to be geotagged with co-ordinates. I'd like to get a good amount for analysis purposes however the REST API is limited, and therefore restricting this to a fairly low number. I've been trying to produce a workaround solution to this, based on a similar question Avoid twitter api limitation with Tweepy however thus far I'm not having much luck. Could anyone help me with this? I'm a newbie to all of this sort of stuff so any help would be really appreciated. Ideally I want this in a pandas dataframe as well. I've been using the following tutorial as a basis for my coding. This can be found at: http://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively./ I've copied the code I have below:
import tweepy
auth = tweepy.AppAuthHandler('', '')
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
if (not api):
print ("Can't Authenticate")
sys.exit(-1)
import sys
import jsonpickle
import os
searchQuery = 'shooting OR stabbing OR violence OR assualt OR attack OR homicide OR punched OR mugging OR murder'
geocode= "41.8781,-87.6298,15km"
maxTweets = 1000000
tweetsPerQry = 100
fName = 'tweets.txt'
sinceId = None
max_id = 1L
tweetCount = 0
print ("Downloading max {0} tweets".format(maxTweets))
with open (fName, 'w') as f:
while tweetCount < maxTweets:
try:
if (max_id <= 0):
if(not sinceId):
new_tweets = api.search(q=searchQuery, geocode=geocode, count=tweetsPerQry)
else:
new_tweets = api.search(q=searchQuery, geocode=geocode, count=tweetsPerQry, since_id=sinceID)
else:
if (not sinceId):
new_tweets = api.search(q=searchQuery, geocode=geocode, count=tweetsPerQry, max_id=str(max_id-1))
else:
new_tweets = api.search(q=searchQuery, geocode=geocode, count=tweetsPerQry, max_id=str(max_id-1), since_id=sinceId)
if not new_tweets:
print ("No more tweets found")
break
for tweet in new_tweets:
f.write(jsonpickle.encode(tweet._json, unpicklable=False)+'\n')
tweetCount += len(new_tweets)
print("Downloaded {0} tweets".format(tweetCount))
max_id = new_tweets[-1].id
except tweepy.TweepError as e:
print("some error : " + str(e))
break
print ("Downloaded {0} tweets, Saved to {1}".format(tweetCount, fName))