Regarding igraph
efficiency see: https://igraph.discourse.group/t/igraph-is-much-slower-than-networkx-when-generating-a-graph/853.
Much faster is to vectorize the coding:
set.seed(2023)
library(igraph)
firstThresh <- 20
secondThresh <- 80
system.time({
n <- 6072; m <- 66923
graph <- sample_gnm(n, m)
E(graph)$weight <- round(runif(m) * 100)
E(graph)$type <- "L"
E(graph)$type[which(E(graph)$weight < secondThresh)] <- "M"
E(graph)$type[which(E(graph)$weight < firstThresh)] <- "C"
})
# Benchmark system.time.
# user system elapsed
# 0.02 0.00 0.02
print.igraph(graph, full="auto.print.lines=0")
# IGRAPH d66a81f U-W- 6072 66923 -- Erdos-Renyi (gnm) graph
# + attr: name (g/c), type (g/c), loops (g/l), m (g/n), weight (e/n), typ (e/c)
Surprise
library(microbenchmark)
microbenchmark(
"one" = (E(graph)[weight < secondTresh]$type <- "M"),
"two" = (E(graph)$type[which(E(graph)$weight < secondTresh)] <- "M")
)
# Unit: microseconds
# expr min lq mean median uq max neval
# one 2648.5 2776.75 3607.604 2851.95 4149.95 19895.9 100
# two 847.3 978.10 1364.860 1019.30 1317.25 3171.7 100
I have no explanation for the performance difference.
However: In base R, why is selecting column, then filtering rows faster than vice versa: filter rows, then select column?.
Update - Why is vectorized code faster?
See also: Why is vectorization faster.
My casual answer:
Consider x <- y. It is much clearer if the coding is the same regardless of whether x consists of one or more values.
In my experience, "vectorized" programming is more difficult to implement but less prone to errors. It is either right or completely wrong.
Avoiding for-loops is also faster. The reason is that the time to execute an assignment statement is mainly determined by overhead, as we can see in the following example.
a <- 0; b <- 0
n <- 1E4; va <- runif(n); vb <- runif(n)
library(microbenchmark)
microbenchmark(
"single" = (a <- b),
"multi" = (va <- vb )
)
## Unit: nanoseconds
## expr min lq mean median uq max neval
## single 0 100 87 100 100 500 100
## multi 0 100 112 100 100 1400 100
object.size(va)
## 80048 bytes
But more importantly in this example
Within the loop, the graph is copied twice in each iteration, a total of 2 times the number of flights.
tracemem(graph)
E(graph)[i]$type <- "M"
untracemem(graph)
# tracemem[0x00000241f1305358 -> 0x00000241f1329f78]:
# tracemem[0x00000241f1329f78 -> 0x00000241f1329ec8]: i_set_edge_attr E<-