0

I am still learning R and am trying to create a database of twitters of political representatives using rtweet. I have a dataframe with the Twitter handles of hundreds of such representatives, the region they represent and their political affiliation.

tweets <- data.frame(Person = c("A", "B", "C"), 
                     Handle = c("@RepA", "@RepB", "@RepC"),
                      Party = c("AA", "CO", "BJ"), 
                     Region = c("P", "D", "R"))

I want to create a separate dataframe of each person's twitter account which includes their respective parties and regions. So far, I have done it manually using the following code:

repA        <- get_timelines('RepA', n=3200, lang="en")
repA$party  <- "AA"
repA$region <- "P"
repB        <- get_timelines('RepB', n=3200, lang="en")
repB$party  <- "CO"
repB$region <- "D"
repC        <- get_timelines('RepC', n=3200, lang="en")
repC$party  <- "BJ"
repC$region <- "R"

But I have at least 500 entries in the original dataframe and would like to automate this process. There must be a cleaner way to do this?

SK5123
  • 3
  • 2
  • Does this answer your question? [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – Limey Jul 04 '22 at 05:21
  • Hi @Limey, I am not sure if it does. I want to create multiple dataframes based on the features of dataframe ```tweet``` instead of joining two existing ones. – SK5123 Jul 04 '22 at 06:09
  • This calls for get/assign functions but they are rather to be avoided. Why do you need to have a separate object for each user? Couldn't you use 'get_timelines' for all users as a vector and then merge the results with 'tweets' to add party and region – Ivana Jul 04 '22 at 06:30
  • `get_timelines('RepA', n=3200, lang="en") %>% left_join(tweets, by=c("xxx", "Handle")` will do the job for "repA", where `xxx` is the name of the column containing the Twitter handle in the data frame returned by `get_timelines()`. – Limey Jul 04 '22 at 06:39
  • To automate the entire process, put your unique twitter handles ina list. The `lapply(myTwitterHandles, function(h) get_timelines(h, n=3200, lang="en") %>% left_join(tweets, by=c("xxx", "Handle"))` will do the job for all representatives. Code is incomplete and untested because your code is not reproducible. @Ivana's comment very relevant. Your workflow is more difficult than it need be because your data structures are not optimal. – Limey Jul 04 '22 at 06:41
  • Thank you both for your suggestions. My original idea to have different dataframes from the start was to achieve two purposes: a) using 'get_timelines' on the entire vector would take a really long time; and b) to analyze tweets at individual levels. – SK5123 Jul 04 '22 at 07:54

1 Answers1

0
users<-substring(tweets$Handle, 2 , nchar(tweets$Handle))

timelines<-lapply(users, function(u) {
tl<-get_timelines(u, n=3200, lang="en")
tl$user<-u
tl$party<-tweets[tweets$Handle==paste0("@",u),"Party"]
tl$region<-tweets[tweets$Handle==paste0("@",u),"Region"]
})

data.table::rbindlist(timelines)
Ivana
  • 146
  • 5