Managing Tweepy API Search

Question

Please forgive me if this is a gross repeat of a question previously answered elsewhere, but I am lost on how to use the tweepy API search function. Is there any documentation available on how to search for tweets using the api.search() function?

Is there any way I can control features such as number of tweets returned, results type etc.?

The results seem to max out at 100 for some reason.

the code snippet I use is as follows

searched_tweets = self.api.search(q=query,rpp=100,count=1000)

score 43 · Answer 1 · edited May 23 '17 at 12:18

I originally worked out a solution based on Yuva Raj's suggestion to use additional parameters in GET search/tweets - the max_id parameter in conjunction with the id of the last tweet returned in each iteration of a loop that also checks for the occurrence of a TweepError.

However, I discovered there is a far simpler way to solve the problem using a tweepy.Cursor (see tweepy Cursor tutorial for more on using Cursor).

The following code fetches the most recent 1000 mentions of 'python'.

import tweepy
# assuming twitter_authentication.py contains each of the 4 oauth elements (1 per line)
from twitter_authentication import API_KEY, API_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET

auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

api = tweepy.API(auth)

query = 'python'
max_tweets = 1000
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query).items(max_tweets)]

Update: in response to Andre Petre's comment about potential memory consumption issues with tweepy.Cursor, I'll include my original solution, replacing the single statement list comprehension used above to compute searched_tweets with the following:

searched_tweets = []
last_id = -1
while len(searched_tweets) < max_tweets:
    count = max_tweets - len(searched_tweets)
    try:
        new_tweets = api.search(q=query, count=count, max_id=str(last_id - 1))
        if not new_tweets:
            break
        searched_tweets.extend(new_tweets)
        last_id = new_tweets[-1].id
    except tweepy.TweepError as e:
        # depending on TweepError.code, one may want to retry or wait
        # to keep things simple, we will give up on an error
        break

see my bellow comment before trying this if you're memory restricted. — Andrei-Niculae Petre, Jun 12 '14 at 15:18
I've expanded the solution to account for potential memory consumption issues in using `tweepy.Cursor`. — gumption, Jun 13 '14 at 16:35
Good idea with the while statement - tweaked it according to my specs but I like it .. +1 (having problems with tweepy.Cursor in v3.5) — tech4242, Jul 22 '17 at 16:25

score 15 · Answer 2 · answered Mar 18 '14 at 07:50

15

There's a problem in your code. Based on Twitter Documentation for GET search/tweets,

The number of tweets to return per page, up to a maximum of 100. Defaults to 15. This was   
formerly the "rpp" parameter in the old Search API.

Your code should be,

CONSUMER_KEY = '....'
CONSUMER_SECRET = '....'
ACCESS_KEY = '....'
ACCESS_SECRET = '....'

auth = tweepy.auth.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
search_results = api.search(q="hello", count=100)

for i in search_results:
    # Do Whatever You need to print here

answered Mar 18 '14 at 07:50

Yuva Raj

3,881
1
19
30

Wait. What if I wish to collect, say, 5000 tweets? – user3075934 Mar 19 '14 at 22:39
3

You can get upto 1000 tweets in a single call by changing the `count` value. Once you made a call, and if you trying to get another 1000 tweets by this same, you will get only same 1000 tweets. so, to get 1001 - 2000, you should use `since_id` & `max_id` parameters. FYI, Twitter only serves tweets from the past week. Not 2 weeks back or months! – Yuva Raj Mar 20 '14 at 07:21
Regardless of the number I give the count variable, the tweets max out at 100, which I guess was my initial point. Any thoughts? – user3075934 Mar 22 '14 at 05:39
4

this is way better than any other suggestions, I use aws free tier and am restricted on memory. If you inspect with `watch cat /proc/meminfo` what happens with the `Cursor` you'll see that `MemFree` is strictly going down, no ups&downs. So after a half an hour my process got killed. My point is, in order to have it efficient, use a `while loop` and a `max_id`. – Andrei-Niculae Petre Jun 12 '14 at 15:17
@AndreiPetre I had not considered memory consumption issues. However, a longer form solution using a `while` loop should also check for errors. I've expanded my answer to include a potential solution using a `while` loop (and minimal error checking). – gumption Jun 13 '14 at 16:29

score 8 · Answer 3 · answered Dec 18 '18 at 20:12

The other questions are old and the API changed a lot.

Easy way, with Cursor (see the Cursor tutorial). Pages returns a list of elements (You can limit how many pages it returns. .pages(5) only returns 5 pages):

for page in tweepy.Cursor(api.search, q='python', count=100, tweet_mode='extended').pages():
    # process status here
    process_page(page)

Where q is the query, count how many will it bring for requests (100 is the maximum for requests) and tweet_mode='extended' is to have the full text. (without this the text is truncated to 140 characters) More info here. RTs are truncated as confirmed jaycech3n.

If you don't want to use tweepy.Cursor, you need to indicate max_id to bring the next chunk. See for more info.

last_id = None
result = True
while result:
    result = api.search(q='python', count=100, tweet_mode='extended', max_id=last_id)
    process_result(result)
    # we subtract one to not have the same again.
    last_id = result[-1]._json['id'] - 1

score 4 · Answer 4 · edited Jul 19 '22 at 00:39

I am working on extracting twitter data for around a location (in here, around India), for all tweets which include a special keyword or a list of keywords.

import tweepy
import credentials    ## all my twitter API credentials are in this file, this should be in the same directory as is this script

## set API connection
auth = tweepy.OAuthHandler(credentials.consumer_key, 
                            credentials.consumer_secret)
auth.set_access_secret(credentials.access_token, 
                        credentials.access_secret)
    
api = tweepy.API(auth, wait_on_rate_limit=True)    # set wait_on_rate_limit =True; as twitter may block you from querying if it finds you exceeding some limits

search_words = ["#covid19", "2020", "lockdown"]

date_since = "2020-05-21"

tweets = tweepy.Cursor(api.search, =search_words,
                       geocode="20.5937,78.9629,3000km",
                       lang="en", since=date_since).items(10)
## the geocode is for India; format for geocode="lattitude,longitude,radius"
## radius should be in miles or km


for tweet in tweets:
    print("created_at: {}\nuser: {}\ntweet text: {}\ngeo_location: {}".
            format(tweet.created_at, tweet.user.screen_name, tweet.text, tweet.user.location))
    print("\n")
## tweet.user.location will give you the general location of the user and not the particular location for the tweet itself, as it turns out, most of the users do not share the exact location of the tweet

Results:

created_at: 2020-05-28 16:48:23
user: XXXXXXXXX
tweet text: RT @Eatala_Rajender: Media Bulletin on status of positive cases #COVID19 in Telangana. (Dated. 28.05.2020)
# TelanganaFightsCorona 
# StayHom…
geo_location: Hyderabad, India

score 1 · Answer 5 · edited May 16 '19 at 18:51

1

You can search the tweets with specific strings as showed below:

tweets = api.search('Artificial Intelligence', count=200)

edited May 16 '19 at 18:51

Michel_T.

2,741
5
21
31

answered May 16 '19 at 18:21

Ritesh Soni

21
1

The Tweepy documentation (http://docs.tweepy.org/en/latest/api.html) mentions that only up to 100 tweets will be returned. As of current Tweepy (≤3.8.0), specifying count > 100 doesn't help. – David C. May 05 '20 at 05:50

Managing Tweepy API Search

5 Answers5

Linked