Random meetings in large graphs: efficient way of adding or deleting edges of a graph in R

Question

I am trying to find an efficient way of simulating random meetings in a graph in R using igraph.

I managed to do it using the code below in which I assume edges appear with some probability (prob.meet) and add them to an (pre-existent) empty graph of same size.

However, for large graphs this is not efficient. Plus, I repeat this process over an over.

The first layer of inefficiency is the random selection of edges using the rbin() function (amounts to 25% of the time)
The second layer of inefficiency is adding edges to an existent empty graph using the add_edges() function (amounts to 75% of the time)

Any suggestion on how to improve efficiency?

Here is what I tried:

First, I create a random graph:

library(igraph)

nodes = 5
g1 <- barabasi.game(nodes)
EL1 <- get.edgelist(g1, names=FALSE)

Second, I assume the edges pop up with probability "prob.meet", and add them to an empty graph. The reason I do that is to force g_meet to conform in size with g1.

prob.meet = 0.5

EL_meet <- matrix(EL1[(as.logical(rbinom(nrow(EL1), 1, prob.meet))),],
                  nrow=2,byrow = TRUE
                  )

g_meet <- make_empty_graph(n = nodes) %>%
            add_edges(EL_meet)

How large is large? A couple hundred nodes, a couple thousand or perhaps millions? — Tim, Jun 02 '21 at 16:04
@Tim, for the time being, a couple thousand. Anything that works for something around 100,000 nodes should do it. — Marcos RF, Jun 02 '21 at 16:15
What happens after you create `g_meet` in your workflow? What is a "random meeting", is it an edge between two nodes? Or is it a path between two nodes? Or something else? — emilliman5, Jun 02 '21 at 17:53
@emilliman5, nothing specific. It is all normal. The thing is: I have to repeat this process multiple times and any difference in terms of millisecond counts. A random meeting is an edge between two nodes. — Marcos RF, Jun 02 '21 at 18:07
If a "random meeting" is an edge between two nodes then why not just compute random meetings mathematically? — emilliman5, Jun 02 '21 at 18:52

score 2 · Answer 1 · answered Jun 02 '21 at 18:14

From what I can find, there are only marginal speed gains to be made.

It has been reported that using runif() is faster than using rbinom() in your particular use case. Running it on my system seems to confirm this:

prob.meet <- 0.5
system.time({rbinom(1000000, 1, prob.meet)})
#  user  system elapsed 
#  0.10    0.00    0.11
system.time({runif(1000000) < prob.meet})
#  user  system elapsed 
#  0.05    0.00    0.04

However, as you can see, we are talking about just a couple hundredths of a second of improvement when doing a million random draws.

An alternative to adding the edges to a new empty graph would be deleting the non-meeting edges. That would look something like this:

library(igraph)

delete_non_meeting_edges <- function(g, prob.meet = 0.5) {
  g <- set_edge_attr(g, "meet", E(g), runif(gsize(g)) < prob.meet)
  delete_edges(g, E(g)[!meet])
}

## Usage
g <- barabasi.game(1000000)
delete_non_meeting_edges(g)

However, the above is not really faster. This answer suggests that mutations of igraph objects are inherently slow, due to the underlying data structures. In the linked answer vectorisation is suggested as a way to speed up mutations, but both the example you provided as the code above already use this.

So I am afraid that if you are using igraph, there is not much more speed to be gained.

good point to use `runif`, which is more efficient. Upvoted! — ThomasIsCoding, Jun 02 '21 at 20:16
@Tim, you made a great contribution! I had to vote for ThomasIsCoding since his code beat yours. In any event, you certainly put him on the track. Thx a ton! — Marcos RF, Jun 03 '21 at 14:55

ThomasIsCoding · Accepted Answer · 2021-06-02T23:33:41.307

You can use delete_edges like below

g1 %>%
  delete_edges(which(runif(ecount(.)) > prob.meet))

where

runif() > prob.meet yields random logical array indicating the removals.
ecount returns the number of edges in graph g1.
which returns the edge ids that should be removed.

Benchmarking

f_OP <- function() {
  EL1 <- get.edgelist(g1, names = FALSE)
  EL_meet <- matrix(EL1[(as.logical(rbinom(nrow(EL1), 1, prob.meet))), ],
    nrow = 2, byrow = TRUE
  )
  make_empty_graph(n = nodes) %>%
    add_edges(EL_meet)
}

f_Tim <- function() {
  delete_non_meeting_edges <- function(g, prob.meet) {
    g <- set_edge_attr(g, "meet", E(g), runif(gsize(g)) < prob.meet)
    delete_edges(g, E(g)[!meet])
  }
  delete_non_meeting_edges(g1, prob.meet)
}

f_TIC <- function() {
  g1 %>%
    delete_edges(which(runif(ecount(.)) > prob.meet))
}


nodes <- 100000
g1 <- barabasi.game(nodes)
prob.meet <- 0.5
microbenchmark(
  f_OP(),
  f_Tim(),
  f_TIC(),
  unit = "relative"
)

and you will see

Unit: relative
    expr      min       lq     mean   median       uq      max neval
  f_OP() 1.584501 1.583768 1.631061 1.618017 1.675887 1.535542   100
 f_Tim() 1.517888 1.520832 1.597230 1.584570 1.679498 1.585434   100
 f_TIC() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000   100

Thx a lot, @ThomasIsCoding! Besides, your `f_TIC` function is way shorter than ours. — Marcos RF, Jun 03 '21 at 14:56

Random meetings in large graphs: efficient way of adding or deleting edges of a graph in R

2 Answers2

Benchmarking