2

I am attempting to download all of the followers and their information (location, date of creation, etc.) from the Haaretz Twitter feed (@haaretzcom) using the twitteR package in R. The Twitter feed has over 90,000 followers I was able to download the full list of followers no problem using the code below.

require(twitteR)
require(ROAuth)
#Loading the Twitter OAuthorization
load("~/Dropbox/Twitter/my_oauth")

#Confirming the OAuth
registerTwitterOAuth(my_oauth)

# opening list to download
haaretz_followers<-getUser("haaretzcom")$getFollowerIDs(retryOnRateLimit=9999999)

However, when I try to extract their information using the lookupUsers function, I run into the rate limit. The trick of using retryOnRateLimit does not seem to work here:)

 #Extracting user information for each of Haaretz followers
 haaretz_followers_info<-lookupUsers(haaretz_followers)

 haaretz_followers_full<-twListToDF(haaretz_followers_info)

 #Export data to csv
 write.table(haaretz_followers_full, file = "haaretz_twitter_followers.csv",  sep=",")

I believe I need to write a for loop and subsample over the list of followers (haaretz_followers) to avoid the rate limit. In this loop, I need to include some kind of rest/pause like Keep downloading tweets within the limits using twitteR package. The twitteR package is a bit opaque on how to go about this and I am bit of a novice writing for loops in R. Finally, I know that depending on how you write your loops in R, greatly affects the run time. Any help you could give would be much appreciated!

Community
  • 1
  • 1
Thomas
  • 195
  • 4
  • 11
  • Can you adapt this loop for your purposes? It's for getting the number of followers of a twitter user and includes a pause between each iteration: http://stackoverflow.com/a/10838639/1036500 (also here: http://stackoverflow.com/a/10838639/1036500) – Ben Apr 30 '13 at 15:31

1 Answers1

2

Something like this will likely get the job done:

for (follower in haaretz_followers){
  Sys.sleep(5)
  haaretz_followers_info<-lookupUsers(haaretz_followers)

  haaretz_followers_full<-twListToDF(haaretz_followers_info)

  #Export data to csv
  write.table(haaretz_followers_full, file = "haaretz_twitter_followers.csv",  sep=",")
}

Here you're sleeping for 5 seconds between each call. I don't know what you're rate limit is -- you may need more or less to comply with Twitter's policies.

You're correct that the way you structure loops in R will affect performance, but in this case, you're intentionally inserting a pause which will be orders of magnitude longer than any wasted CPU time from a poorly-designed loop, so you don't really need to worry about that.

Jeff Allen
  • 17,277
  • 8
  • 49
  • 70
  • Thanks for this! Yeah it seemed to work. Yet I keep getting the following error: 'Error: Malformed response from server, was not JSON. RMate stopped at line 31 The most likely cause of this error is Twitter returning a character which can't be properly parsed by R. Generally the only remedy is to wait long enough for the offending character to disappear from searches (e.g. if using searchTwitter()). Calls: lookupUsers -> lapply -> FUN -> -> twFromJSON Execution halted' I think this is an issue with certain foreign characters...any thoughts? – Thomas May 02 '13 at 19:17
  • @JeffAllen why would you loop through every `follower` and `lookupusers` for the entire list ? every time ? Aren't just looping the same thing everytime ? – Shiva Prakash Jan 07 '16 at 06:48