Python: How to search tweets and store in database?

Question

I've got a nice Python script that currently prints out the past 200 tweets from a given username.

However, I'd like to modify it so that instead it will collect the past 200 tweets that include a certain hashtag (from any username) and then I'd like to store those results in a database.

Can anyone provide a suggestion on how to modify the code below?

import sys
import operator
import requests
import json
import twitter

twitter_consumer_key = 'XXXX'
twitter_consumer_secret = 'XXXX'
twitter_access_token = 'XXXX'
twitter_access_secret = 'XXXX'

twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret)

statuses = twitter_api.GetUserTimeline(screen_name=handle, count=200, include_rts=False)

for status in statuses:
  if (status.lang == 'en'):
    print status

Possible duplicate of [Twitter API - Display all tweets with a certain hashtag?](http://stackoverflow.com/questions/2714471/twitter-api-display-all-tweets-with-a-certain-hashtag) — Xander Luciano, Aug 30 '16 at 17:53
[It does not appear to be possible](https://twittercommunity.com/t/get-user-timeline-tag-filtering/17508) to search by hashtag with the [GetUserTimeline](https://dev.twitter.com/rest/reference/get/statuses/user_timeline) function. As per Xander's suggestion, perhaps the [GetSearch](https://pythonism.wordpress.com/2013/10/12/using-the-twitter-api-with-python-twitter/) method would be helpful. Otherwise, you could download batches of 200 tweets at a time, and filter them yourself (and I think that Twitter limits you to downloading the user's last 3200 tweets or so). — Boa, Aug 30 '16 at 18:02
As for storing in a DB, unless you're working within some framework that provides a DB abstraction layer (i.e. Django, web2py, etc.), check out http://www.sqlalchemy.org/. — Boa, Aug 30 '16 at 18:04

Young · Answer 1 · 2016-08-30T18:10:15.390

Not familiar with the twitter package but this could be a suggestion that you can work on. Depends on how you want to save the tweet, you can replace the "print status" with the way you want. However, this only allows you to filter the 200 tweets rather than get the 200 tweets that contain certain hashtag.

import sys
import operator
import requests
import json
import twitter

twitter_consumer_key = 'XXXX'
twitter_consumer_secret = 'XXXX'
twitter_access_token = 'XXXX'
twitter_access_secret = 'XXXX'

twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret)

statuses = twitter_api.GetUserTimeline(screen_name=handle, count=200, include_rts=False)

tag_list = ["Xmas", "Summer"]
for status in statuses:
  if (status.lang == 'en'):
    #assume there exists a hashtag in the tweet
    for hashtag in status.entities.hashtags:
      if hashtag.text in tag_list:
        print status

Thanks for the suggestion, but I really need to scan for hashtags from all users (rather than filter a single user's tweets). I can't find any documentation on this "twitter" library I've been using thus far, so I might switch over to something else that has a more useful method. — Matt Brown, Aug 31 '16 at 03:34
@MattBrown Ah, you just want a simple search function. Just noticed on the Twitter official site: "The Twitter Search API searches against a sampling of recent Tweets published in the past 7 days." If you want to match for completeness you can consider using a Streaming API instead. — Young, Sep 06 '16 at 12:36

score 0 · Answer 2 · answered Aug 31 '16 at 11:18

I am attaching a java code that will print out past 100 tweets including '#engineeringproblems' hashtag (from any user). You need to add twitter API 'twitter4J' in the library.

API download link- http://twitter4j.org/en/index.html#download

Java source code:

public static void main(String[] args) {

    ConfigurationBuilder cb = new ConfigurationBuilder();
    cb.setDebugEnabled(true)
     .setOAuthConsumerKey("xxxx")
     .setOAuthConsumerSecret("xxxx")
     .setOAuthAccessToken("xxxx")
     .setOAuthAccessTokenSecret("xxxx");

    Twitter twitter = new TwitterFactory(cb.build()).getInstance();
    Query query = new Query("#engineeringproblems ");
    int numberOfTweets = 100;
    long lastID = Long.MAX_VALUE;
    ArrayList<Status> tweets = new ArrayList<Status>();

    while (tweets.size() < numberOfTweets) {
        if (numberOfTweets - tweets.size() > 100) {
            query.setCount(100);
        } else {
            query.setCount(numberOfTweets - tweets.size());
        }
        try {
            QueryResult result = twitter.search(query);
            tweets.addAll(result.getTweets());
            System.out.println("Gathered " + tweets.size() + " tweets" + "\n");
            for (Status t : tweets) {
                if (t.getId() < lastID) {
                    lastID = t.getId();
                }
            }

        } catch (TwitterException te) {
            System.out.println("Couldn't connect: " + te);
        };
        query.setMaxId(lastID - 1);
    }
    for (int i = 0; i < tweets.size(); i++) {
        Status t = (Status) tweets.get(i);


        String user = t.getUser().getScreenName();
        String msg = t.getText();

        System.out.println(i + " USER: " + user + " wrote: " + msg + "\n");
    }
}

score 0 · Answer 3 · answered Sep 01 '16 at 02:43

Sorry, but I've really been looking for a Python solution and I believe I've finally found it and tested it successfully. Code is below. Still looking for a way to modify the script to enter each line into a SQL database, but I hopefully I can find that elsewhere.

pip install TwitterSearch

from TwitterSearch import *
try:
    tso = TwitterSearchOrder() # create a TwitterSearchOrder object
    tso.set_keywords(['Guttenberg', 'Doktorarbeit']) # let's define all words we would like to have a look for
    tso.set_language('de') # we want to see German tweets only
    tso.set_include_entities(False) # and don't give us all those entity information

    # it's about time to create a TwitterSearch object with our secret tokens
    ts = TwitterSearch(
        consumer_key = 'aaabbb',
        consumer_secret = 'cccddd',
        access_token = '111222',
        access_token_secret = '333444'
     )

     # this is where the fun actually starts :)
    for tweet in ts.search_tweets_iterable(tso):
        print( '@%s tweeted: %s' % ( tweet['user']['screen_name'], tweet['text'] ) )

except TwitterSearchException as e: # take care of all those ugly errors if there are some
    print(e)

Python: How to search tweets and store in database?

3 Answers3