2

I want to do an network analysis of the tweets of some users of my interest and the mentioned users in their tweets.

I retrieved the tweets (no retweets) from several user timelines using the rtweet package in r and want to see who they mention in their tweets.

There is even a variable with the screen names of those useres who are mentioned which will serve me as the target group for my edge list. But sometimes they mention several users and then the observation looks for example like this: c('luigidimaio', 'giuseppeconteit') whereas there is only one user mentioned it is naming just this one user as an observation (eg. agorarai). I want to split those observations containing several mentioned users into single observations for each user. So out of one observation containing both mentioned users as a vector I would have to split it into two observation each containing one of the mentioned users.

The code looks like this so far:

# get user timelines of the most active italian parties (excluding retweets)
tmls_nort <- get_timelines(c("Mov5Stelle", "pdnetwork", "LegaSalvini"), 
                      n = 3200, include_rts = FALSE
                      )

# create an edge list
tmls_el = as.data.frame(cbind(Source = tolower(tmls_nort$screen_name), Target = tolower(tmls_nort$mentions_screen_name)))

Here is an extract of my dataframe:

Source Target n 
<fct> <fct> <int> 
1 legasalvini circomassimo 2 
2 legasalvini 1giornodapecora 2 
3 legasalvini 24mattino 2 
4 legasalvini agorarai 28 
5 legasalvini ariachetira 2
6 legasalvini "c(\"raiportaaporta\", \"brunovespa\")" 7 
```
Laura
  • 35
  • 5
  • Hi Laura, welcome to SO! Could you share the starting data that you have using `dput(head(your_data_here, 20))` (that's going to print the first 20 rows of your data with the structure) in R and posting the output editing your question? (Italian politicians, seems interesting your job:) ). – s__ Nov 22 '19 at 10:44
  • edit the question to add them. I suppose it should be `dput(head(tmls_nort, 20))` (look if there are all the cases you need to see in the first 20 rows). – s__ Nov 22 '19 at 13:03
  • Thanks for getting back to me @s_t! It was giving me a weird output using dput but I used head instead which gave me the following (see nex comment because of limited characters). I deleted some rows in between (also because of the limited characters) that you can see an example of a c('userx', 'usery') obeservation. – Laura Nov 22 '19 at 13:06
  • Source Target n 1 legasalvini _circomassimo_ 2 2 legasalvini 1giornodapecora 2 3 legasalvini 24mattino 2 4 legasalvini agorarai 28 5 legasalvini ariachetira 26 ... 8 legasalvini "c(\"raiportaaporta\", \"brunovespa\")" 7 – Laura Nov 22 '19 at 13:07
  • Since you're using rtweet, you can use the functions it includes to do this kind of stuff for you: `rtweet::network_data()` and `rtweet::network_graph()` – knapply Nov 24 '19 at 19:46

1 Answers1

2

We can start from this: first you could clean up your columns, tidy up the data and plot your network. The data I used are:

tmls_el 
            Source                                                                    Target  n
1      legasalvini                                                              circomassimo  2
2      legasalvini                                                           1giornodapecora  2
3      legasalvini                                                                 24mattino  2
4      legasalvini                                                                  agorarai 28
5      legasalvini                                                               ariachetira 26
6      legasalvini                                         c("raiportaaporta", "brunovespa")  7
7 movimento5stelle c("test1", "test2", "test3", "test4", "test5", "test6", "test7", "test8") 20

Now the what I've done:

# here you replace the useless characer with nothing
tmls_el$Target <- gsub("c\\(\"", "", tmls_el$Target)
tmls_el$Target <- gsub("\\)", "", tmls_el$Target)
tmls_el$Target <- gsub("\"", "", tmls_el$Target)

library(stringr)
temp <- data.frame(str_split_fixed(tmls_el$Target, ", ", 8))
tmls_el_2 <- data.frame(   
  Source = c(rep(as.character(tmls_el$Source),8))
  , Target = c(as.character(temp$X1),as.character(temp$X2),as.character(temp$X3),
               as.character(temp$X4),as.character(temp$X5),as.character(temp$X6),
               as.character(temp$X7),as.character(temp$X8))
  , n =  c(rep(as.character(tmls_el$n),8)))

Note: it works with the example you give, if you have more than 8 target, you have to change the number 2 to 2,3,...k, and paste the new column in Target, and repeat k times Source and n. Surely there is a more elegant way, but this works.

Here you can create edges and nodes:

library(dplyr)
el <- tmls_el_2 %>% filter(Target !='')
no <- data.frame(name = unique(c(as.character(el$Source),as.character(el$Target))))

Now you can use igraph to plot the results:

library(igraph)
g <- graph_from_data_frame(el, directed=TRUE, vertices=no)
plot(g, edge.width = el$n/2)

enter image description here


With data:

tmls_el <- data.frame(Source = c("legasalvini","legasalvini","legasalvini","legasalvini","legasalvini","legasalvini","movimento5stelle"),
                      Target = c("circomassimo","1giornodapecora","24mattino","agorarai","ariachetira","c(\"raiportaaporta\", \"brunovespa\")","c(\"test1\", \"test2\", \"test3\", \"test4\", \"test5\", \"test6\", \"test7\", \"test8\")"),
                      n = c(2,2,2,28,26,7,20))
s__
  • 9,270
  • 3
  • 27
  • 45
  • Ok now try to do the same with the Target and n to see where the error is, one function by time. In case, update your answer putting the output of dput(tmls_el), that is going to give you "structure(list....)" paste it **editing the question** not in the comment and you'll publish your data, and I'll try to make it work. – s__ Nov 22 '19 at 15:40
  • You should edit the question not the comments to put the data: the data should be put in the question, not in comments. Click on "edit" under your question then you can paste the structure(list...). To communicate etc -> comments, to give data, code -> question. Do not worry, all of us have been new here at least once! – s__ Nov 22 '19 at 16:12