44

I have been trying to figure this out but this is a really frustrating. I'm trying to get tweets with a certain hashtag (a great amount of tweets) using Tweepy. But this doesn't go back more than one week. I need to go back at least two years for a period of a couple of months. Is this even possible, if so how?

Just for the check here is my code

import tweepy
import csv

consumer_key = '####'
consumer_secret = '####'
access_token = '####'
access_token_secret = '####'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

# Open/Create a file to append data
csvFile = open('tweets.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)


for tweet in tweepy.Cursor(api.search,q="#ps4",count=100,\
                           lang="en",\
                           since_id=2014-06-12).items():
    print tweet.created_at, tweet.text
    csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
MustiHakan
  • 441
  • 1
  • 4
  • 3
  • 7
    Note that `since_id=2014-06-12` is equivalent to `since_id=1996`, because 2014 minus 6 minus 12 equals 1996. – Robᵩ Jun 14 '14 at 02:31
  • Possible duplicate of [Getting historical data from Twitter](https://stackoverflow.com/questions/1662151/getting-historical-data-from-twitter) – Nemo Feb 18 '18 at 18:21
  • You can use the Rest APIs to get tweets older than a week For more details visit the twitter API reference https://dev.twitter.com/rest/reference/get/statuses/user_timeline – Mohammad Sadiq Mar 07 '16 at 05:43

8 Answers8

22

You cannot use the twitter search API to collect tweets from two years ago. Per the docs:

Also note that the search results at twitter.com may return historical results while the Search API usually only serves tweets from the past week. - Twitter documentation.

If you need a way to get old tweets, you can get them from individual users because collecting tweets from them is limited by number rather than time (so in many cases you can go back months or years). A third-party service that collects tweets like Topsy may be useful in your case as well (shut down as of July 2016, but other services exist).

Luigi
  • 4,129
  • 6
  • 37
  • 57
22

As you have noticed Twitter API has some limitations, I have implemented a code that do this using the same strategy as Twitter running over a browser. Take a look, you can get the oldest tweets: https://github.com/Jefferson-Henrique/GetOldTweets-python

  • is this working still? when I try with --since and until options it gives me 0 tweets? – Luke Barker May 10 '16 at 09:03
  • 2
    Did not work for me on Vanilla Ubuntu 12.04, So, I had to install pyquery & lxml first. If anyone gets in the same problem, please do apt-get install python-pip; pip install pyquery; apt-get install python-lxml; Then the script will work. ;) – Rehmat Jun 01 '16 at 07:59
  • For anyone still struggling, you need to have these 2 libs installed as well: sudo apt-get install libxslt-dev libxml2-dev – Pinkesh Badjatiya Jul 10 '17 at 19:55
  • I noticed that it doesn't retrieve Retweets – Daniel Zhang Oct 16 '21 at 17:47
8

Found one code that would help retrieve older tweets. https://github.com/Jefferson-Henrique/GetOldTweets-python

To get old tweets, run the following command in the directory where the code repository got extracted.

python Exporter.py --querysearch 'keyword' --since 2016-01-10 --until 2016-01-15 --maxtweets 1000

And it returned a file 'output_got.csv' with 1000 tweets during the above days with your keyword

You need to install a module 'pyquery' for this to work

PS: You can modify 'Exporter.py' python code file to get more tweet attributes as per your requirement.

Shivangi Gupta
  • 866
  • 8
  • 21
  • I tried to search tweets for one month duration on 2015 with maxtweets 10000. But it can only get around 200. It seems that the older date it is, the fewer data it can get. – Shaohua Huang Mar 09 '17 at 14:52
5

2018 update: Twitter has Premium search APIs that can return results from the beginning of time (2006):

https://developer.twitter.com/en/docs/tweets/search/overview/premium#ProductPackages

Search Tweets: 30-day endpoint → provides Tweets from the previous 30 days.

Search Tweets: Full-archive endpoint → provides complete and instant access to Tweets dating all the way back to the first Tweet in March 2006.

With an example Python client: https://github.com/twitterdev/search-tweets-python

Geordie
  • 1,920
  • 2
  • 24
  • 34
2

Knowing this is a very old question but still, some folks might be facing the same issue. After some digging, I found out Tweepy's search only returns data for the past 7 days and that some times lead to buy third party service. I utilised python library, GetOldTweets3 and it worked fine for me. The utility of this library is really easy. The only limitation of this library that we can't search for more than one hashtag in one execution but it works fine to search for multiple accounts at the same time.

AbhishekSinghNegi
  • 183
  • 1
  • 1
  • 10
  • 1
    It didn't work for me. An error occured during an HTTP request: HTTP Error 404: Not Found running the examples in the website you linked. Python 3.7, Ubuntu 20.04 – Katu Feb 12 '21 at 23:32
1

use the args "since" and "until" to adjust your timeframe. You are presently using since_id which is meant to correspond to twitter id values (not dates):

for tweet in tweepy.Cursor(api.search,
                           q="test",
                           since="2014-01-01",
                           until="2014-02-01",
                           lang="en").items():
RohitJ
  • 543
  • 4
  • 8
  • 3
    yeah i tried that but that is nor possible, if the date are older than a week the output is just nothing. So I have to use a different method than search i guess but i dont find any other which works – MustiHakan Jun 14 '14 at 21:24
1

As others have noted, the Twitter API has the date limitation, but not the actual advanced search as implemented on twitter.com. So so the solution is to use Python's wrapper for Selenium or PhantomJS to iterate through the twitter.com endpoint. Here's an implementation using Selenium that someone has posted on Github: https://github.com/bpb27/twitter_scraping/

dancow
  • 3,228
  • 2
  • 26
  • 28
1

I can't believe nobody said this but this git repository completely solved my problem. I haven't been able to utilize other solutions such as GOT or Twitter API Premium.

Try this, definitely useful:

https://betterprogramming.pub/how-to-scrape-tweets-with-snscrape-90124ed006af

https://github.com/MartinBeckUT/TwitterScraper/tree/master/snscrape/cli-with-python

emrerkaslan
  • 25
  • 1
  • 5