3

I am working on a project where I need to find the reach of some social events. I want to know how many people who were exposed to comments on a festival called Tinderbox in Denmark. What I do is to get the statusses on Twitter including the word "tinderbox" on the language danish. Then I want to extract the number of followers from these screennames. So the first part of my code is given by:

library("twitteR")
setup_twitter_oauth(consumer_key,consumer_secret,access_token,access_secret)
1
#get data
TB<-searchTwitter("tinderbox", lan="da", n=10000)
#put into a dataframe
df <- do.call("rbind", lapply(TB, as.data.frame))

My thought is to make use of the same output as in the example below, that is to get followersCount directly from the twitter data. The example is found here on stackoverflow. But I dont know how to do it to solve my purpose (fetching large number of followers and followees in R)

library(twitteR)
user <- getUser("krestenb")
followers <- user$getFollowers()
b <- twListToDF(followers)
f_count <- as.data.frame(b$followersCount)
u_id <- as.data.frame(b$id)
u_sname <- as.data.frame(b$screenName)
u_name <- as.data.frame(b$name)
final_df <- cbind(u_id,u_name,u_sname,f_count)
sort_fc <- final_df[order(-f_count),]
colnames(sort_fc) <- c('id','name','s_name','fol_count')

My problem is that I cannot simply use a vector of user-name in the followers <- <- user$getFollowers(), by extracting the list of screennames from the df$screenName.

So my thought was that maybe I needed to do a loop with all the different screennames. But I do not know how to do this.

I have that I have painted the picture of what I want to get, and how I thought/think I can get there.

Help is much apreciated as the festival is due this weekend.

Community
  • 1
  • 1
Sander Ehmsen
  • 83
  • 1
  • 6
  • I don't have OAuth configured on my local machine, so I can't use your code. But I can tell you that the typical way to "avoid" using loops in R is to use one of the `apply` functions. You can define a vector of users, and then iterate over it using `apply()`. – Tim Biegeleisen Jun 23 '15 at 05:04
  • Thanks for your quick reply. I am still a novice in R - having only just recent begun using it for other stuff than just making statistical models. I would therefore appreciate if you could type in some sample code abvout how to use the apply-function in the code given above. – Sander Ehmsen Jun 23 '15 at 06:24

1 Answers1

1

Here is some sample code based on what you had in your original problem which will aggregate Twitter results for a set of users:

# create a data frame with 4 columns and no rows initially
df_result <- data.frame(t(rep(NA, 4)))
names(df_result) <- c('id', 'name', 's_name', 'fol_count')
df_result <- df_result[0:0,]

# you can replace this vector with whatever set of Twitter users you want
users <- c("krestenb", "tjb25587")                    # tjb25587 (me) has no followers

# iterate over the vector of users and aggregate each user's results
sapply(users, function(x) {
                  user <- getUser(x)
                  followers <- user$getFollowers()
                  if (length(followers) > 0) {        # ignore users with no followers
                      b <- twListToDF(followers)
                      f_count <- as.data.frame(b$followersCount)
                      u_id <- as.data.frame(b$id)
                      u_sname <- as.data.frame(b$screenName)
                      u_name <- as.data.frame(b$name)
                      final_df <- cbind(u_id,u_name,u_sname,f_count)
                      sort_fc <- final_df[order(-f_count),]
                      colnames(sort_fc) <- c('id','name','s_name','fol_count')
                      df_result <<- rbind(df_result, sort_fc)
                  }
              })

Important points

I used the global assignment operator <<- when doing the rbind on the df_result data frame so that it will "stick" outside the loop. As I mentioned in my original answer, you can use the sapply function to iterate over a vector of users. Inside the loop, the results are aggregated.

I tested with a vector containing Twitter users both which have and do not have followers and it worked.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thanks for your reply. I am sure you are right about the approach. But I can't figure out the following error. If I run twitter.users <- c("krestenb") #the random guy krestenb is chosen, because he doesnt have too many followers hest <- sapply(twitter.users, function(x) { user <- getUser(x)}) I get the expected list. But if I add twitter.users <- c("krestenb") hest <- sapply(twitter.users, function(x) { user <- getUser(x) followers <- user$getFollowers() # do the rest of your processing here for that user }) I get: Error in can_access_other_account(user_id) : – Sander Ehmsen Jun 23 '15 at 07:31
  • Plus the following errors: Error in can_access_other_account(user_id) attempt to apply non-function In addition: Warning message: In twInterfaceObj$doAPICall("account/verify_credentials", ...) : Rate limit encountered & retry limit reached - returning partial results – Sander Ehmsen Jun 23 '15 at 07:35
  • Without configuring my machine with OAuth this is the most help I can give unfortunately. – Tim Biegeleisen Jun 23 '15 at 07:35
  • Thanks for your effort. I will try to make it work. If you want it, I can send the OAuth to your e-mail. It is a test-twitter-account. Anyway I very much appreciate the time and effort you have allready spend :-). – Sander Ehmsen Jun 23 '15 at 07:44
  • @SanderEhmsen I updated my answer. The code runs fine on my local R setup. Please mark the answer correct if it helped you. Also let me know if you have any other issues. – Tim Biegeleisen Jun 23 '15 at 09:47
  • Thanks for the code Tim. It works. But it only works when everybody has followers. So if there are users without followers it returns an error. I reckon that af IF-statement somewhere should do the trick. So if the list is empty, it should return the numberr 0 to the dataframe. Do you have a quick way of writing this. My own more tedious way would be to extract the vectors seperately, and then add them to the dataframe using SQL. – Sander Ehmsen Jun 23 '15 at 13:32
  • @SanderEhmsen I updated my code again and it is now stable for inputs containing Twitter users with no followers. – Tim Biegeleisen Jun 23 '15 at 15:53