5

I have a large igraph object with almost a 1M nodes and 1.5M edges. After researching a while I could not find a procedure to sum a node's neighbors attributes, in this case, it's a binary one. At the moment, the best solution I found way the following:

V(g)$sum = sapply( ego(g,1,V(g),mode = 'all',mindist = 1), function(v) sum(V(G)[v]$attr) )

However, after 12 hours it's still crunching.

Any suggestions?

UPDATE 1: Let's consider the following graph

library(igraph)
G <- graph.formula(1-+2,1-+3,2-+4,2-+5,3-+6,5-+7,7-+8,8-+9,9+-7, 9-+10,
               6-+9,1-+5,3-+9,10-+11,11-+12,11-+5,12-+4,4-+10,10-+4,11-+10)
V(G)$attr = c(1,1,0,0,1,0,1,0,1,0,1,0)
plot(G, vertex.label.color = "white",  edge.width=E(G)$weight, layout = layout.circle(G))

enter image description here

and the desired outcome should be this...

 sapply( ego(G,1,V(G),mode = 'all',mindist = 1), function(v) sum(V(G)[v]$attr) )
 [1] 2 2 2 1 4 1 2 2 1 2 1 1

@Tamás, I tried to access the neighbors function without using a loop, but instead of the outcome described above I got this...

sapply(neighbors(G,V(G)),function (v) sum(V(G)[v]$attr))
2 3 5 
1 0 1 
Community
  • 1
  • 1
Cristobal
  • 309
  • 1
  • 6

3 Answers3

1

I am also working with large networks and I'm having some problems with the time it takes igraph to do "simple" stuff, like calculating betweenness and closeness. In your case, however, I think you can work around this issue outside the network framework.

1st, convert your network into a data.frame and use the library data.table, which is really fast for working large data sets to calculate the sum of the attributes.

library(igraph)
library(magrittr)
library(data.table)

# simple network
  g<- graph.formula(1-+2,1-+3,2-+4,2-+5,3-+6,5-+7,7-+8,8-+9,9+-7, 9-+10,
                     6-+9,1-+5,3-+9,10-+11,11-+12,11-+5,12-+4,4-+10,10-+4,11-+10)

  V(g)$attr = c(1,1,0,0,1,0,1,0,1,0,1,0)


# convert the network to data.table
  dt <- as_long_data_frame(g) %>% setDT()

# Calculate the sum of neighbors' attributes by origin (from). This is really fast in data.table
  mysum <- dt[, .(attr_sum = sum(to_attr)), by= from]

# get the sum result back in the data doing a simple merge
  dt <- dt[mysum, on=.(from)] 

# get the sum into the network object
  E(g)$attr_sum <- dt$attr_sum
rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
0

The bottleneck is almost surely the ego() function. Try using neighbors() instead; it is specialized to get the first-order neighbors only so it is faster - and you don't need to construct V(g) in every iteration either.

Tamás
  • 47,239
  • 12
  • 105
  • 124
  • I use this command sapply(V(G), function(v) sum(neighbors(G, v, mode = 'all')$attr)), but there is not significant improvement – Cristobal Aug 31 '16 at 13:02
  • Try pre-fetching the attribute for all vertices into a variable (e.g., `attr <- V(g)$attr`) and then subsetting that variable (i.e. `attr[neighbors(G, v, mode="all")]`). Not sure if this helps, though. – Tamás Sep 02 '16 at 08:42
0

As noted by @Tamás, the bottleneck lies in the ego function (neighbors will create a similar bottleneck). For adjacent nodes (i.e., neighbors of order 1), this bottleneck can be avoided by pulling the adjacency matrix using get.adjacency and then multiplying the matrix by the attribute vector using %*%:

library(igraph)    
set.seed(42)
g <- erdos.renyi.game(1000000, 1500000, type = "gnm")
V(g)$att <- as.logical(rbinom(vcount(g), 1, 0.5))

system.time({
   ma  <- get.adjacency(g)
   att <- V(g)$att
   res <- as.numeric(ma %*% att)
})
#  user  system elapsed 
# 0.642   0.138   0.786
George Wood
  • 1,914
  • 17
  • 18