0

I'm fairly new to R, I use it for a course on network analysis at my university. As part of a research project, I want to analyse tweets by Donald Trump and Hillary Clinton. I successfully managed to grant RStudio access to my twitter account, but every time I try to download tweets, I get a fairly meager selection ranging from 1,100 tweets at best to just 800-900 tweets at worst. I do not understand this as I do not get any error message, either. Am I missing something? I thought the limit on downloading tweets was at 3,200?

This is my code:

#load twittR package and necessary tool for login
library(twitteR)
library(ROAuth)
#load login data
api_key <- "blah"
api_secret <- "blah"
access_token <- "blah"
access_token_secret <- "blah"
#login
setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret)
#retreive tweets by Donald Trump, maximum number is 3200
tweetsTrump <- userTimeline("realDonaldTrump", n=3200)
#convert those tweets to a dataframe
Trump.df <- twListToDF(tweetsTrump)

I am eternally grateful for every useful tip!

  • 1
    As mentioned in numerous similar questions: _"The Twitter Search API searches against a sampling of recent Tweets published in the past 7 days."_. It's not an archive for all tweets that everyone can query as he or she wishes... – lukeA Oct 05 '16 at 08:17
  • @lukeA, I'd add that as an answer. I can't seem to find any examples in R to mark this as a duplicate. – sebastian-c Oct 05 '16 at 08:28
  • 1
    @sebastian-c Hm I found e.g. http://stackoverflow.com/questions/22052811/twitter-searchtwitter-only-returns-a-small-set-of-tweets via https://www.google.com/search?q=site%3Astackoverflow.com+"searchTwitter" – lukeA Oct 05 '16 at 08:43
  • `searchTwitter` functions differently than `userTimeline`, and going past 7 days is doable with `userTimeline` so both the question and answer are different in this case. – calder-ty Dec 08 '16 at 14:33

1 Answers1

0

Have a look at the Twitter API documentation where it says:

The Twitter Search API searches against a sampling of recent Tweets published in the past 7 days.

Before getting involved, it’s important to know that the Search API is focused on relevance and not completeness. This means that some Tweets and users may be missing from search results. If you want to match for completeness you should consider using a Streaming API instead.

Thus, the results from the API are limited per se. If you want more, use the streaming api or services like Gnip.

lukeA
  • 53,097
  • 5
  • 97
  • 100
  • Thats the search api, what he's talking about is slightly different which is getting tweets from a specific user, which is possible to go back more than 7 days. – calder-ty Dec 08 '16 at 14:22
  • Good point. Then we are [here](https://dev.twitter.com/rest/reference/get/statuses/user_timeline), API-wise. I dunno why the API returns less then 3200. But it seems you get the 3200 by paginating through the history using the `maxID` parameter. – lukeA Dec 08 '16 at 14:32
  • Yeah... When ive done it before i only cared about last 100 tweets, so i didn't bother with pagination, But it could be that the [api](https://dev.twitter.com/rest/reference/get/statuses/user_timeline) says that even if retweets and replies are excluded, they are excluded after the api uses your `count` parameter to pull all the tweets it can. [twitteR](https://cran.r-project.org/web/packages/twitteR/twitteR.pdf) has `include` retweets set to `FALSE`. – calder-ty Dec 08 '16 at 14:47