3

I am trying to download tweets from way past months between certain date range. I am able to download only with a week but not past that.

Code:

import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import pandas as pd
import json
import csv
import sys
import time

ckey = 'key'
csecret = 'key'
atoken = 'key'
asecret = 'key'

def toDataFrame(tweets):

    DataSet = pd.DataFrame()

    DataSet['tweetID'] = [tweet.id for tweet in tweets]
    DataSet['tweetText'] = [tweet.text.encode('utf-8') for tweet in tweets]
    DataSet['tweetRetweetCt'] = [tweet.retweet_count for tweet in tweets]
    DataSet['tweetFavoriteCt'] = [tweet.favorite_count for tweet in tweets]
    DataSet['tweetSource'] = [tweet.source for tweet in tweets]
    DataSet['tweetCreated'] = [tweet.created_at for tweet in tweets]
    DataSet['userID'] = [tweet.user.id for tweet in tweets]
    DataSet['userScreen'] = [tweet.user.screen_name for tweet in tweets]
    DataSet['userName'] = [tweet.user.name for tweet in tweets]
    DataSet['userCreateDt'] = [tweet.user.created_at for tweet in tweets]
    DataSet['userDesc'] = [tweet.user.description for tweet in tweets]
    DataSet['userFollowerCt'] = [tweet.user.followers_count for tweet in tweets]
    DataSet['userFriendsCt'] = [tweet.user.friends_count for tweet in tweets]
    DataSet['userLocation'] = [tweet.user.location for tweet in tweets]
    DataSet['userTimezone'] = [tweet.user.time_zone for tweet in tweets]
    DataSet['Coordinates'] = [tweet.coordinates for tweet in tweets]
    DataSet['GeoEnabled'] = [tweet.user.geo_enabled for tweet in tweets]
    DataSet['Language'] = [tweet.user.lang for tweet in tweets]
    tweets_place= []
    #users_retweeted = []
    for tweet in tweets:
        if tweet.place:
            tweets_place.append(tweet.place.full_name)
        else:
            tweets_place.append('null')
    DataSet['TweetPlace'] = [i for i in tweets_place]
    #DataSet['UserWhoRetweeted'] = [i for i in users_retweeted]
     
    return DataSet

OAUTH_KEYS = {'consumer_key':ckey, 'consumer_secret':csecret,'access_token_key':atoken, 'access_token_secret':asecret}
auth = tweepy.OAuthHandler(OAUTH_KEYS['consumer_key'], OAUTH_KEYS['consumer_secret'])
#auth = tweepy.AppAuthHandler('key', 'key')
 
api = tweepy.API(auth, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
if (not api):
    print ("Can't Authenticate")
    sys.exit(-1)
else:
# I am trying to download from Dec 1st to Dec 7th but I am not able to

    cursor = tweepy.Cursor(api.search, q='#chennairains OR #chennaihelp OR #chennaifloods',since= '2015-12-20',until='2015-12-21',lang='en',count=100)
    results=[]
    for item in cursor.items():
        results.append(item)
        
    DataSet = toDataFrame(results)
    DataSet.to_csv('output.csv',index=False)

The program very well download data from within a week but not able to download from more beyond a week. I did try referencing few posts here but most of them are left unanswered.

Sitz Blogz
  • 1,061
  • 6
  • 30
  • 54
  • related: [How can I get tweets older than a week (using tweepy or other python libraries)](http://stackoverflow.com/q/24214189/4279) – jfs Dec 25 '15 at 19:23
  • @J.F.Sebastian Thank you so much .. Already tried this but doesn't work. – Sitz Blogz Dec 25 '15 at 19:46
  • 1
    [The most upvoted answer](http://stackoverflow.com/a/24246840/4279) from the link says that it cannot work (via Search API). Unless something has changed since then; `tweepy.Cursor(api.search, ...)` won't work for finding old tweets. – jfs Dec 25 '15 at 19:53

1 Answers1

3

Twitter limits the amount of data returned from their REST API, and Tweepy's API class is using Twitter's REST API.

From https://dev.twitter.com/overview/general/things-every-developer-should-know:

There are pagination limits Rest API Limit Clients may access a theoretical maximum of 3,200 statuses via the page and count parameters for the user_timeline REST API methods. Other timeline methods have a theoretical maximum of 800 statuses. Requests for more than the limit will result in a reply with a status code of 200 and an empty result in the format requested. Twitter still maintains a database of all the tweets sent by a user. However, to ensure performance of the site, this artificial limit is temporarily in place.

If you are trying to get a longer lookback, paid services like Gnip and DataSift can provide this data.

wpercy
  • 9,636
  • 4
  • 33
  • 45
Geoff Wright
  • 188
  • 10
  • Thank you for the response. Tweet limit is not the issue here. I want tweets from particular date that could be like 1st Nov only 1 day will also do. But need from past dates. – Sitz Blogz Dec 23 '15 at 06:27
  • @SitzBlogz: then you should ask how to download a tweet from a day ago, not a month. – jfs Dec 25 '15 at 15:46
  • @J.F.Sebastian One particular date from which is older. Could be from November or October. One day old I already get with current code. – Sitz Blogz Dec 25 '15 at 18:46