0

I'm trying to pull data from Twitter over a month or so for a project. There are <10000 tweets over this time period with this hashtag, but I'm only seeming to get all the tweets from the current day. I got 68 yesterday, and 80 today; both were timestamped with the current day.

api = tweepy.API(auth)
igsjc_tweets = api.search(q="#igsjc", since='2014-12-31', count=100000)

ipdb> len(igsjc_tweets)
80

I know for certain there should be more than 80 tweets. I've heard that Twitter rate-limits to 1500 tweets at a time, but does it also rate-limit to a certain day? Note that I've also tried the Cursor approach with

igsjc_tweets = tweepy.Cursor(api.search, q="#igsjc", since='2015-12-31', count=10000)

This also only gets me 80 tweets. Any tips or suggestions on how to get the full data would be appreciated.

goodcow
  • 4,495
  • 6
  • 33
  • 52

2 Answers2

0

Here's the official tweepy tutorial on Cursor. Note: you need to iterate through the Cursor, shown below. Also, there is a max count that you can pass .items(), so it's probably a good idea to pull month-by-month or something similar and probably a good idea to sleep in between calls. HTH!

igsjc_tweets_jan = [tweet for tweet in tweepy.Cursor(
                    api.search, q="#igsjc", since='2016-01-01', until='2016-01-31').items(1000)] 
Kevin
  • 7,960
  • 5
  • 36
  • 57
  • I just read that the search API only has tweets that are about a week old. Is there any way around this? – goodcow Feb 17 '16 at 16:18
  • I think if you read user timelines you can get tweets older than one week. [Here](http://stackoverflow.com/questions/24214189/how-can-i-get-tweets-older-than-a-week-using-tweepy-or-other-python-libraries) is a link to another similar SO question. The most helpful answer for you will be the one that links to 'GetOldTweets' repo [here](https://github.com/Jefferson-Henrique/GetOldTweets-python) – Kevin Feb 18 '16 at 12:26
0

First, tweepy cannot bring too old data using its search API I don't know the exact limitation but maybe month or two back only.

anyway, you can use this piece of code to get tweets. i run it in order to get tweets from last few days and it works for me.

notice that you can refine it and add geocode information - i left an example commented out for you

flag = True
last_id = None
while (flag):
   flag = False
   for status in tweepy.Cursor(api.search,
                          #q='geocode:"37.781157,-122.398720,1mi" since:'+since+' until:'+until+' include:retweets',

                          q="#igsjc",
                          since='2015-12-31',

                          max_id=last_id,
                          result_type='recent',
                          include_entities=True,
                          monitor_rate_limit=False, 
                          wait_on_rate_limit=False).items(300):
       tweet = status._json
       print(Tweet)

       flag = True # there still some more data to collect
       last_id = status.id # for next time

Good luck

Samer Aamar
  • 1,298
  • 1
  • 15
  • 23