I'm having trouble creating a directed graph (with the igraph package) from my dataset (data table of 10 columns) in R. The task is as follows: I need to build a directed (network) graph, where an individual X is connected to individual Y if X invited Y to the platform. Ultimately, I need to identify the size of the longest chain of the network and calculate the clustering coefficient.
After filtering my dt, dt.user consists of the following 2 columns: user_id, inviter_id.
user_id: user identification
inviter_id: id of the user that invited this user to the platform
After cleaning the data (removing all NA values), I'm trying to make this work, but I'm not sure if I'm doing it in the right way since my clustering coefficient is 0 (which seems very unlikely):
all.users <- dt.users[, list(inviter_id, user_id)]
g.invites.network <- graph.data.frame(all.users, directed = TRUE)
I've tried switching the direction of the connections, but I still get the same results in terms of diameter and clustering coefficient:
all.users <- dt.users[, list(user_id, inviter_id)]
My question is, is my directed graph wrong? If so, what am I doing wrong? I believe that my answer is wrong because of the clustering coefficient of 0. To me, it seems very unlikely that there seems to be no cluster forming at all in this network. And should I keep ...list(inviter_id), user_id
instead of ...list(user_id, inviter_id)
?
Sample data (40 rows):
dt.users <- data.table::data.table(
inviter_id = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 23L, 22L, 31L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 63L, 4L, 4L, 4L),
user_id = c(17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 32L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 58L, 59L, 60L, 64L, 71L, 75L, 76L, 78L)
)
Any help would be greatly appreciated!