2

If you look at this DAG (directed acyclic graph):

I want to create a dict which maps the distance from the lowest node(s) to all others nodes which is similar to the x position (height) of each node from the bottom in the rendered graph. For that given graph it would be:

distance_nodes_map: {
  0: {'base-zero', 'base-one'}, 
  1: {'low-b', 'low-a', 'low-c'}, 
  3: {'high-x', 'high-z', 'high-y'}, 
  2: {'mid-r', 'mid-q', 'mid-p'}, 
  4: {'super'}
}

I wrote an algorithm which worked for that graph above but then I've tested another graph and it didn't work anymore. I tried some algorithms and functions like shortest path or descendants_at_distance but I don't think they are really helpful as an input to calculate the distances.

My algorithm doesn't work for instance for this graph:

https://gist.github.com/timaschew/3b08a07243fa6f43773014ef5e705c96

Here is gist which contains:

  • a python script which reads a YAML file, the dependency/graph structure and generates a HTML with a rendered mermaid graph (I've removed my algorithm to calculate the distances in a wrong way)
  • both graphs which are shown here, as a YAML file
timaschew
  • 16,254
  • 6
  • 61
  • 78

2 Answers2

1

Untested pseudo-code because my lunch break is nearly over:

You have a multi-root tree, with one chosen principal root.

  1. For each root, create a subgraph consisting of all reachable nodes for that root.

  2. Starting with the principal root (root a), compute the distance/shortest path length to the root for all nodes in the corresponding subgraph A.

  3. Find all subgraphs that share at least one node with the principal subgraph, and select the subgraph (subgraph B) that has the node (node x) with the smallest distance to the principal root.

  4. Compute the distance to the root b for all nodes in subgraph B. Add the distance d(node x, root a). Subtract the distance d(node x, root b).

  5. Create the union of subgraph A and B. Repeat steps 3-5 until no roots remain.

  6. Subtract the maximum distance & reverse the sign such that principal root has the largest distance/order value.

Edit:

My pseudocode works (*). I blame user error. ;-)

#!/usr/bin/env python
"""
https://stackoverflow.com/q/66584661/
"""
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx


def hierarchical_layout(graph):

    longest_path = nx.algorithms.dag.dag_longest_path(graph)
    principal_root = longest_path[0]

    roots = [node for node, degree in list(graph.in_degree) if degree==0]
    subgraphs = {root : create_subgraph(graph, root) for root in roots}

    # Starting with the principal root (root a), compute the
    # longest path length to the root for all nodes in the
    # corresponding subgraph A.
    node_to_level = single_source_longest_dag_path_length(subgraphs[principal_root], principal_root)

    explored = subgraphs[principal_root]
    del subgraphs[principal_root]

    while len(explored) < len(graph):

        # Find all subgraphs that share at least one node with the
        # principal subgraph, and select the subgraph (subgraph B) that
        # has the node (node x) with the smallest distance to the
        # principal root.
        minimum_cost = np.inf
        minimum_cost_node = None
        minimum_cost_root = None
        for root, subgraph in subgraphs.items():
            for node in subgraph.nodes:
                if node in node_to_level:
                    if node_to_level[node] < minimum_cost:
                        minimum_cost = node_to_level[node]
                        minimum_cost_node = node
                        minimum_cost_root = root

        assert minimum_cost_node, "Could not find a connected subgraph."

        # Compute the distance to the root b for all nodes in subgraph
        # B. Add the distance d(node x, root a). Subtract the distance
        # d(node x, root b).
        path_lengths = [len(path) for path in nx.all_simple_paths(subgraphs[minimum_cost_root], minimum_cost_root, minimum_cost_node)]
        offset = np.max(path_lengths) - 1
        for node, distance in single_source_longest_dag_path_length(subgraphs[minimum_cost_root], minimum_cost_root).items():
            if not node in node_to_level:
                node_to_level[node] = distance + minimum_cost - offset

        # Create the union of subgraph A and B.
        explored = nx.compose(explored, subgraphs[minimum_cost_root])
        del subgraphs[minimum_cost_root]

    return node_to_level


def create_subgraph(G, node):
    # https://stackoverflow.com/a/45678930/2912349
    nodes = nx.single_source_shortest_path(G,node).keys()
    return G.subgraph(nodes)


def single_source_longest_dag_path_length(graph, s):
    # from AlaskaJoslin's comment to https://stackoverflow.com/a/60978007/2912349
    dist = dict.fromkeys(graph.nodes, -float('inf'))
    dist[s] = 0
    topo_order = nx.topological_sort(graph)
    for n in topo_order:
        for s in graph.successors(n):
            if dist[s] < dist[n] + 1:
                dist[s] = dist[n] + 1
    return dist


if __name__ == '__main__':

    # edge_list = [
    #     ("n10", "n11"),
    #     ("n11", "n12"),
    #     ("n12", "n13"),
    #     ("n13", "n14"),
    #     ("n20", "n21"),
    #     ("n20", "n14"),
    #     ("n21", "n22"),
    #     ("n22", "n23"),
    #     ("n30", "n23"),
    #     ("n30", "n31"),
    #     ("n31", "n32"),
    # ]

    edge_list = [
        ("low-a", "base-one"),
        ("low-c", "base-zero"),
        ("low-c", "base-one"),
        ("mid-p", "low-b"),
        ("mid-p", "low-c"),
        ("mid-q", "low-a"),
        ("high-x", "mid-p"),
        ("high-y", "mid-p"),
        ("high-y", "base-zero"),
        ("high-z", "mid-q"),
        ("high-z", "mid-r"),
        ("high-z", "base-one"),
        ("super", "high-x"),
    ]

    graph = nx.DiGraph()
    graph.add_edges_from(edge_list)

    node_to_level = hierarchical_layout(graph)

    # reverse output format
    distance_nodes_map = dict()
    max_distance = np.max(list(node_to_level.values()))
    for node, distance in node_to_level.items():
        reversed_distance = max_distance - distance
        if reversed_distance in distance_nodes_map:
            distance_nodes_map[reversed_distance].add(node)
        else:
            distance_nodes_map[reversed_distance] = set([node])

    # print(distance_nodes_map)
    for ii, nodes in sorted(distance_nodes_map.items())[::-1]:
        print(f"{ii} : {nodes}")

Yields:

    # 4 : {'super'}
    # 3 : {'high-x', 'high-y', 'high-z'}
    # 2 : {'mid-p', 'mid-r', 'mid-q'}
    # 1 : {'low-a', 'low-b', 'low-c'}
    # 0 : {'base-one', 'base-zero'}

(*) "subtract distance d(node x, root b)" naturally implied the longest path length between node x and root b. Obviously.

Paul Brodersen
  • 11,221
  • 21
  • 38
  • Thanks for you answer. The idea with subgraph is really good, I've tried to modify my existing algorithm with that and the usage of nx.dag_longest_path_length. But the result for the second graph is ok, but for the first graph it's broken. I'm not sure how to implement your pseudo-code, so I've started a bounty for a real implementation, if you're interested ;) – timaschew Mar 14 '21 at 13:53
  • 1
    @timaschew Updated my answer with working code. – Paul Brodersen Mar 15 '21 at 11:48
  • Thanks, that looks great! There is some missing piece. In my graphs there is indeed a principal root, but that is by coincidence. As written in the bounty description it must work for all DAGs in general. But even for the two graphs there is still a problem: how to find out the principal root? I've tried to pick a random root but that ended up in wrong results. Do we need in general to iterate over all roots and merge the output together or what would be the solution? – timaschew Mar 16 '21 at 21:02
  • @timaschew Pick the root (in-degree=0) with the longest path length to a leaf (out-degree=0). – Paul Brodersen Mar 17 '21 at 09:58
  • @timaschew Added a method to compute the principal root. – Paul Brodersen Mar 17 '21 at 10:12
  • Thanks for the quick update but unfortunately it's not working for the graph if we remove super. Then we have 3 roots and the algorithm produces this wrong output: `{0: {'low-a'}, 1: {'base-one', 'base-zero', 'mid-q', 'mid-r'}, 2: {'low-c', 'high-z', 'low-b'}, 3: {'mid-p'}, 4: {'high-y', 'high-x'}}` – timaschew Mar 17 '21 at 22:42
  • Good catch. It seems all the distances should be longest distances, not shortest. I amended my answer (which now also works for the graph without `super`) – Paul Brodersen Mar 18 '21 at 10:51
1

You are looking for an algorithm that draws a layered graph. There are many different algorithms, and you should choose the one that best fit your needs (for example, have a look at the following paper A Technique for Drawing Directed Graphs by Gansner et al.).

Many of those algorithms are already implemented in Graphviz (a very famous and powerful graph visualization software). Once you have installed it, it's pretty straightforward to compute the result you are looking for (G is your directed acyclic graph built using networkx.DiGraph):

from networkx.drawing.nx_agraph import graphviz_layout

def get_distance_nodes_map(G):
    pos = graphviz_layout(G, prog='dot')
    coor = sorted({y for k, (x, y) in pos.items()})
    kmap = dict(zip(coor, range(len(coor))))
    distance_nodes_map = {level: set() for level in kmap.values()}
    for k, (x, y) in pos.items():
        distance_nodes_map[kmap[y]].add(k)
    return distance_nodes_map

Here are a couple of examples using data that you provided:

>>> from networkx import DiGraph
>>> from pprint import PrettyPrinter
>>> pp = PrettyPrinter()
>>> G1 = DiGraph()
>>> G1.add_edges_from([('super', 'high-x'), ('high-x', 'mid-p'),
...                    ('mid-p', 'low-b'), ('mid-p', 'low-c'),
...                    ('low-c', 'base-zero'), ('low-c', 'base-one'),
...                    ('high-y', 'mid-p'), ('high-y', 'base-zero'),
...                    ('high-z', 'base-one'), ('high-z', 'mid-r'),
...                    ('high-z', 'mid-q'), ('mid-q', 'low-a'),
...                    ('low-a', 'base-one')])
>>> pp.pprint(get_distance_nodes_map(G1))
{0: {'base-one', 'base-zero'},
 1: {'low-a', 'low-b', 'low-c'},
 2: {'mid-p', 'mid-r', 'mid-q'},
 3: {'high-y', 'high-x', 'high-z'},
 4: {'super'}}
>>> G2 = DiGraph()
>>> G2.add_edges_from([('n10', 'n11'), ('n11', 'n12'), ('n12', 'n13'),
...                    ('n13', 'n14'), ('n20', 'n14'), ('n20', 'n21'),
...                    ('n21', 'n22'), ('n22', 'n23'), ('n30', 'n23'),
...                    ('n30', 'n31'), ('n31', 'n32')])
>>> pp.pprint(get_distance_nodes_map(G2))
{0: {'n32'},
 1: {'n31', 'n23'},
 2: {'n30', 'n22'},
 3: {'n21', 'n14'},
 4: {'n13', 'n20'},
 5: {'n12'},
 6: {'n11'},
 7: {'n10'}}
Riccardo Bucco
  • 13,980
  • 4
  • 22
  • 50