1

I have recently tried out GetOldTweets3 (https://pypi.org/project/GetOldTweets3/) to download tweets that contains the word "iPhone". The code can be seen below, where it gets all the tweets and then writes it into csv.

def get_tweets(keyword, start_date, end_date, max_tweets):

    start_time = time.time()

    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(keyword).setSince(start_date).setUntil(end_date).setMaxTweets(max_tweets).setLang("en")

    # List of object get stored in "tweets" variable
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)

    elapsed_time = time.time() - start_time
    print(elapsed_time)

    with open(url + "/twitter_scrape_" + start_date + ".csv", "w", encoding = "utf-8") as csvfile:
        fieldnames = ["Date", "Username", "Tweet", "No. of Retweets"]
        writer = csv.DictWriter(csvfile, fieldnames = fieldnames, lineterminator = "\n")

        writer.writeheader()

        for tweet in tweets:
            writer.writerow({"Date": tweet.date,
                             "Username": str(tweet.username), 
                             "Tweet": str(tweet.text),
                             "No. of Retweets": str(tweet.retweets)})

    print("Data is stored in: " + url)


get_tweets("iPhone", "2013-09-10", "2013-09-11", 10000)

However, these are the times (in seconds) when I'm trying to download the tweets.

10 tweets: ~2 seconds | 1,000 tweets: ~126 seconds | 10,000 tweets: ~1400 seconds

I'm trying to download a day's worth of tweets relating to the keyword. Is there a way: 1) For me to check what progress I'm in, as the above code will just load but I was unable to check the status, and 2) Is there a faster way to get the data aside from using GetOldTweets3?

Thanks a for the help in advance!

1 Answers1

0
  1. try Python to print out status bar and percentage
  2. try use https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example for multi-thread. You can run some profile test to see which part cost time most. I guess it's the url open part. So multi-thread will help you.
Fu Hanxi
  • 186
  • 1
  • 15