5

Twitter only returns 100 tweets per "page" when returning search results on the API. They provide the max_id and since_id in the returned search_metadata that can be used as parameters to get earlier/later tweets.

Twython 3.1.2 documentation suggests that this pattern is the "old way" to search:

results = twitter.search(q="xbox",count=423,max_id=421482533256044543)
for tweet in results['statuses']:
    ... do something

and that this is the "new way":

results = twitter.cursor(t.search,q='xbox',count=375)
for tweet in results:
    ... do something

When I do the latter, it appears to endlessly iterate over the same search results. I'm trying to push them to a CSV file, but it pushes a ton of duplicates.

What is the proper way to search for a large number of tweets, with Twython, and iterate through the set of unique results?

Edit: Another issue here is that when I try to iterate with the generator (for tweet in results:), it loops repeatedly, without stopping. Ah -- this is a bug... https://github.com/ryanmcgrath/twython/issues/300

Clay
  • 2,949
  • 3
  • 38
  • 54

4 Answers4

1

I had the same problem, but it seems that you should just loop through a user's timeline in batches using the max_id parameter. The batches should be 100 as per Terence's answer (but actually, for user_timeline 200 is the max count), and just set the max_id to the last id in the previous set of returned tweets minus one (because max_id is inclusive). Here's the code:

'''
Get all tweets from a given user.
Batch size of 200 is the max for user_timeline.
'''
from twython import Twython, TwythonError
tweets = []
# Requires Authentication as of Twitter API v1.1
twitter = Twython(PUT YOUR TWITTER KEYS HERE!)
try:
    user_timeline = twitter.get_user_timeline(screen_name='eugenebann',count=200)
except TwythonError as e:
    print e
print len(user_timeline)
for tweet in user_timeline:
    # Add whatever you want from the tweet, here we just add the text
    tweets.append(tweet['text'])
# Count could be less than 200, see:
# https://dev.twitter.com/discussions/7513
while len(user_timeline) != 0: 
    try:
        user_timeline = twitter.get_user_timeline(screen_name='eugenebann',count=200,max_id=user_timeline[len(user_timeline)-1]['id']-1)
    except TwythonError as e:
        print e
    print len(user_timeline)
    for tweet in user_timeline:
        # Add whatever you want from the tweet, here we just add the text
        tweets.append(tweet['text'])
# Number of tweets the user has made
print len(tweets)
Eugene
  • 1,539
  • 12
  • 20
  • 2
    One of the issues is that the new `cursor` approach, recommended in the documentation, loops infinitely through results and never uses `max_id` to proceed through searches. – Clay Jan 14 '14 at 00:15
  • @Clay, and almost four years later this still seems to be true. Gah. – alttag Oct 16 '17 at 22:50
0

As per the official Twitter API documentation.

Count optional

The number of tweets to return per page, up to a maximum of 100

Community
  • 1
  • 1
Terence Eden
  • 14,034
  • 3
  • 48
  • 89
  • I'm aware of that. How should I return, for instance, 425 search results using Twython? My understanding was that the `.cursor` format in Twython would iterate through the pages of search results using `max_id` to avoid duplicates, but that doesn't appear to happen. – Clay Jan 10 '14 at 12:11
0

You need to make repeated calls to the python method. However, there is no guarantee that these will be the next N, or if the tweets are really coming in it might miss some.

If you want all the tweets in a time frame you can use the streaming api: https://dev.twitter.com/docs/streaming-apis and combine this with the oauth2 module.

How can I consume tweets from Twitter's streaming api and store them in mongodb

python-twitter streaming api support/example

Disclaimer: i have not actually tried this

Community
  • 1
  • 1
Paul
  • 7,155
  • 8
  • 41
  • 40
  • Actually, according to Twitter's [working with timelines](https://dev.twitter.com/docs/working-with-timelines) document, you can control what you receive with a search. The advantage to a search, of course, is that you instantly can return a large number of tweets for a low volume search term (that might take several hours to collect through the stream). I was just curious how to use Twython to do it. – Clay Jan 10 '14 at 13:34
0

As a solution to the problem of returning 100 tweets for a search query using Twython, here is the link showing how it can be done using the "old way":

Twython search API with next_results

Community
  • 1
  • 1
kundan
  • 1,278
  • 14
  • 27