3

TL;DR: It is ten times faster to generate a list of static networks than it is to merge these static networks into a single dynamic network. Why is this so?

Following this answer, I attempt to generate a random dynamic graph using NetworkX and DyNetx.

The issue arises when dealing with mid-scale networks (approximately 1000 nodes and 1000 timestamps) - memory crashes. Also on a smaller scale (about 100 nodes and 300 timestamps), the process is extremely slow. I believe I've identified the impediment, but I'm not sure how to deal with it.

The following is a simple example of code that generates a random temporal network:

import dynetx as dnx
import networkx as nx
import itertools
from random import random

def dynamic_random_graph(n, steps, up_rate, seed=42):
    # Create list of static graphs
    list_of_snapshots = list()
    for t in range(0, steps):
        G_t = nx.Graph()
        edges = itertools.combinations(range(n), 2)
        G_t.add_nodes_from(range(n))

        for e in edges:
           if random() < up_rate:
            G_t.add_edge(*e)

        list_of_snapshots.append(G_t)

    # Merge the static graphs into dynamic one
    dynamic_graph = dnx.DynGraph()
    for t, graph in enumerate(list_of_snapshots):
        dynamic_graph.add_interactions_from(graph.edges(data=False), t=t)
    
    return dynamic_graph

If we run the following command:

%timeit dynamic_random_graph(300, 100, 0.5) # Memory was crahsed on larger networks.
>> 1 loop, best of 5: 15.1 s per loop

In contrast, if we run the code without the networks merge, we will get significantly better results:

%timeit dynamic_random_graph_without_merge(300, 100, 0.5) # Ignore the merge part in the function
>> 1 loop, best of 5: 15.1 s per loop

We can work on networks with 1000 nodes without memory crash if we run the function without the merge part.

So, I'd like to look at the DyNetx source code and try to figure out what's wrong with the add_interactions_from method.

The function is short and simple, but I'm curious why it takes so much time and memory, and how I can improve it. What are your thoughts?


This is the source code:

def add_interactions_from(self, ebunch, t=None, e=None):
        """Add all the interaction in ebunch at time t.
        Parameters
        ----------
        ebunch : container of interaction
            Each interaction given in the container will be added to the
            graph. The interaction must be given as as 2-tuples (u,v) or
            3-tuples (u,v,d) where d is a dictionary containing interaction
            data.
        t : appearance snapshot id, mandatory
        e : vanishing snapshot id, optional
        See Also
        --------
        add_edge : add a single interaction
        Examples
        --------
        >>> import dynetx as dn
        >>> G = dn.DynGraph()
        >>> G.add_edges_from([(0,1),(1,2)], t=0)
        """
        # set up attribute dict
        if t is None:
            raise nx.NetworkXError(
                "The t argument must be a specified.")
        # process ebunch
        for ed in ebunch:
            self.add_interaction(ed[0], ed[1], t, e)

I suppose the loop at the end is the source of all problems. Link to the add_interaction implementation.

Yanirmr
  • 923
  • 8
  • 25
  • AFAIK, networkx is not a very efficient package. I think it was designed for very small graphs. You may be interested in alternative packages like neo4j or snap. – Jérôme Richard May 10 '21 at 22:49

1 Answers1

2

just a few considerations:

  • it is completely normal that creating a snapshot list without the merging phase is less costly than merging them in a DynGraph: this is prevalently due to the fact that temporal information for replicated edges have to be compressed as edge's attributes;

  • the random graphs you are generating are dense (50% of the edges are present, something unrealistic in most real contexts) and this requires constant edges' attributes updates. By reducing the number of edges you'll be able to scale up to bigger networks. Just as an example, consider that for the ER model you are simulating it suffices a p=1/N (where N is the number of nodes in the graph) to guarantee a supercritical regime (i.e., a single connected component);

  • dynetx is built extending networkx that is not particularly scalable (both in terms of memory consumption and execution times): when dealing with dense, heavily edge-attributed, graphs such limitations are more evident than ever;

  • the way you are building the dynamic graph is likely the most time-consuming one available. You are adding interactions among each pair of nodes without leveraging the knowledge of their effective duration. If the interaction (u,v) takes place k times from t to t+k you can insert such edge just once specifying its vanishing time, thus reducing the graph manipulation operations.

Indeed, DyNetx is not designed to handle particularly large graphs, however, we leveraged it to analyze interaction networks built on top of online social network data several orders of magnitude larger (in terms of nodes) than the reported examples.

As I said before: real networks are sparser than the ones you are simulating. Moreover, (social) interactions usually happen in "bursts". Those two data characteristics often mitigate the library limitations.

Anyhow, we welcome every contribution to the library: if anyone would like to work on its scalability he will have all our support!