0

I have a list of 200 twitter usernames (username_list) and I want a dataframe of how many times they are retweeted. The original data (overly simplified) looks like this:

   screen_name       retweet_screen_name
 
 1 screen_name1      retweet_screen_name1        
 2 screen_name1      retweet_screen_name2
 3 screen_name2      retweet_screen_name1
 4 screen_name1      retweet_screen_name1
 5 screen_name3      retweet_screen_name2

The end dataframe will look something like below and would be interpreted as screen_name1 has retweeted retweet_screen_name1 two times.

               retweet_screen_name1     retweet_screen_name2 .......... retweet_screen_name200
screen_name1           2                         1                                etc
screen_name2           1                         0                                etc
screen_name3           0                         1                                etc

The code below is my start...

## maybe add in a loop .... for (username in username_list) ...
retweet.counts <- function(username) {
  countCol <- all_text %>%
    select(screen_name, created_at, retweet_screen_name) %>%
    mutate(year = substr(created_at, 1, 4)) %>%
    filter(year > 2017 & year < 2021) %>%
    group_by(screen_name) %>%
    summarise(username = sum(retweet_screen_name == username, na.rm = TRUE))
  return(countCol)
}

I also found this code and think it could potentially be helpful.

library(dplyr)
library(purrr)

all_text %>%
map(~table(.x)) %>%
lapply(as_tibble) %>%
bind_rows(.id = "var")

Help is needed!

mk2080
  • 872
  • 1
  • 8
  • 21

0 Answers0