I'm doing a simple task: to iterate over all vertices and compute new attribute based on that of their neighbors. I search the SO and so far I know there are at least three way to do it:
- Use ad_adj_list to create a adj list and then iterate over each element;
- use sapply to iterate each vertex directly.
However, both methods take too long for the magnitude of my data (300k vertices and 8 million edges). Is there any fast way to loop over vertices? thanks!
For benchmark, say I have the following sample data:
set.seed <- 42
g <- sample_gnp(10000, 0.1)
V(g)$name <- seq_len(gorder(g)) # add a name attribute for data.table merge
V(g)$attr <- rnorm(gorder(g))
V(g)$mean <- 0 # "mean" is the attribute I want to compute
The code for method 1. is that:
al <- as_adj_list(g)
attr <- V(g)$attr
V(g)$mean <- sapply(al, function(x) mean(attr[x]))
# took 28s
# most of the time is spent on creating the adj list
The code for method 2. is that:
compute_mean <- function(v){
mean(neighbors(g, v)$attr)
}
V(g)$mean <- sapply(V(g), compute_mean) # took 33s
I BELIEVE that igraph-R SHOULD NOT be so slow in interating vertices, otherwise, this will make analysis of large graph with size of millions impossible, which task I think should be quite common to R users!
Update
According to @MichaelChirico's comment, now I came up with a third method: import the graph structure into a data.table and do the calculation with the data.table by
syntax, as follows:
gdt.v <- as_data_frame(g, what = "vertices") %>% setDT() # output the vertices
gdt.e <- as_data_frame(g, what = "edges") %>% setDT() # output the edges
gdt <- gdt.e[gdt.v, on = c(to = "name"), nomatch = 0] # merge vertices and edges data.table
mean <- gdt[, .(mean = mean(attr)), keyby = from][, mean]
V(g)$mean <- mean
# took only 0.74s !!
The data.table way is MUCH faster. However, its result is NOT exactly identical to that of the first two methods. Besides, I'm very disappointed to see that I have to rely on another package to do such a simple task, which I thought should be the strength of igraph-R. Hope I'm wrong!