Representing graphs (data structure) in Python

Question

How can one neatly represent a graph in Python? (Starting from scratch i.e. no libraries!)
What data structure (e.g. dicts/tuples/dict(tuples)) will be fast but also memory efficient?
One must be able to do various graph operations on it.

As pointed out, the various graph representations might help. How does one go about implementing them in Python?

As for the libraries, this question has quite good answers.

There a lot of libraries out there already: http://graph-tool.skewed.de/performance, https://code.google.com/p/python-graph/, http://networkx.github.io/ — Kassym Dorsel, Oct 24 '13 at 14:33
For implementing a Graph look at the Wikipedia article which lists common implementations and their efficiency in both memory and speed: http://en.wikipedia.org/wiki/Graph_(abstract_data_type)#Representations — Kassym Dorsel, Oct 24 '13 at 14:35
You might try GitHub.com/thePastor/pangaia. It needs a little rewrite to use the standard library's defaultdict (which wasn't out when the code was written). It uses a recursive data structure to make it more elegant than other implementations. — theDoctor, Jan 11 '18 at 23:53
For *directed* graphs, this [essay from python.org](https://www.python.org/doc/essays/graphs/) suggests a `dict` of `list`s. Basically something like `{: [, ...], ...}`. — djvg, Mar 25 '20 at 16:57
You can implement using dictionary as adjacency list with keys as nodes and values as a list of adjacent nodes for each key. — Shahrukh khan, Apr 13 '20 at 08:26

score 203 · Accepted Answer · edited Feb 19 '20 at 15:16

Even though this is a somewhat old question, I thought I'd give a practical answer for anyone stumbling across this.

Let's say you get your input data for your connections as a list of tuples like so:

[('A', 'B'), ('B', 'C'), ('B', 'D'), ('C', 'D'), ('E', 'F'), ('F', 'C')]

The data structure I've found to be most useful and efficient for graphs in Python is a dict of sets. This will be the underlying structure for our Graph class. You also have to know if these connections are arcs (directed, connect one way) or edges (undirected, connect both ways). We'll handle that by adding a directed parameter to the Graph.__init__ method. We'll also add some other helpful methods.

import pprint
from collections import defaultdict


class Graph(object):
    """ Graph data structure, undirected by default. """

    def __init__(self, connections, directed=False):
        self._graph = defaultdict(set)
        self._directed = directed
        self.add_connections(connections)

    def add_connections(self, connections):
        """ Add connections (list of tuple pairs) to graph """

        for node1, node2 in connections:
            self.add(node1, node2)

    def add(self, node1, node2):
        """ Add connection between node1 and node2 """

        self._graph[node1].add(node2)
        if not self._directed:
            self._graph[node2].add(node1)

    def remove(self, node):
        """ Remove all references to node """

        for n, cxns in self._graph.items():  # python3: items(); python2: iteritems()
            try:
                cxns.remove(node)
            except KeyError:
                pass
        try:
            del self._graph[node]
        except KeyError:
            pass

    def is_connected(self, node1, node2):
        """ Is node1 directly connected to node2 """

        return node1 in self._graph and node2 in self._graph[node1]

    def find_path(self, node1, node2, path=[]):
        """ Find any path between node1 and node2 (may not be shortest) """

        path = path + [node1]
        if node1 == node2:
            return path
        if node1 not in self._graph:
            return None
        for node in self._graph[node1]:
            if node not in path:
                new_path = self.find_path(node, node2, path)
                if new_path:
                    return new_path
        return None

    def __str__(self):
        return '{}({})'.format(self.__class__.__name__, dict(self._graph))

I'll leave it as an "exercise for the reader" to create a find_shortest_path and other methods.

Let's see this in action though...

>>> connections = [('A', 'B'), ('B', 'C'), ('B', 'D'),
                   ('C', 'D'), ('E', 'F'), ('F', 'C')]
>>> g = Graph(connections, directed=True)
>>> pretty_print = pprint.PrettyPrinter()
>>> pretty_print.pprint(g._graph)
{'A': {'B'},
 'B': {'D', 'C'},
 'C': {'D'},
 'E': {'F'},
 'F': {'C'}}

>>> g = Graph(connections)  # undirected
>>> pretty_print = pprint.PrettyPrinter()
>>> pretty_print.pprint(g._graph)
{'A': {'B'},
 'B': {'D', 'A', 'C'},
 'C': {'D', 'F', 'B'},
 'D': {'C', 'B'},
 'E': {'F'},
 'F': {'E', 'C'}}

>>> g.add('E', 'D')
>>> pretty_print.pprint(g._graph)
{'A': {'B'},
 'B': {'D', 'A', 'C'},
 'C': {'D', 'F', 'B'},
 'D': {'C', 'E', 'B'},
 'E': {'D', 'F'},
 'F': {'E', 'C'}}

>>> g.remove('A')
>>> pretty_print.pprint(g._graph)
{'B': {'D', 'C'},
 'C': {'D', 'F', 'B'},
 'D': {'C', 'E', 'B'},
 'E': {'D', 'F'},
 'F': {'E', 'C'}}

>>> g.add('G', 'B')
>>> pretty_print.pprint(g._graph)
{'B': {'D', 'G', 'C'},
 'C': {'D', 'F', 'B'},
 'D': {'C', 'E', 'B'},
 'E': {'D', 'F'},
 'F': {'E', 'C'},
 'G': {'B'}}

>>> g.find_path('G', 'E')
['G', 'B', 'D', 'C', 'F', 'E']

Even though this question is very old, I think this is exactly the kind of answer I was expecting at that time. The example really helps explain how one could go about the implementation at the same time keeping it really simple. One can find implementations from different open source libraries, but the explanation wouldn't be at par. Thanks! — shad0w_wa1k3r, Jun 10 '15 at 04:49
what kind of modification is required to add weight to the edges ? — pshirishreddy, Aug 10 '15 at 05:38
@pshirishreddy Interesting question! I hadn't thought about that, but my instinct would be to use the `heapq` lib to heapify lists of tuples instead of sets. For example the graph would be a dict of heaps like: `_graph = {'A': heapify([(0.3, 'D'), (0.5, 'B'), (0.75, 'A'), (0.9, 'C')])}` (note: you wouldn't actually use `heapify` like this, read the help for the lib), then you could use the `heapq` functions to insert and get the weighted edges. — mVChr, Aug 10 '15 at 17:39
@mVChr that would mean a `log` time access. But how to extend the dictionary that you used to map both nodeID and weight? — orezvani, Jul 28 '16 at 14:03
Nice ! Function is called recursively.This seems to be a DFS as it keeps on expanding nodes. For the shortest path we can compare the length of the paths and return only the shortest one at the end. — Jwalant Bhatt, Sep 20 '17 at 03:20
Hi would it be right to call this data structure an Adjacency List implementation? — Ghos3t, Jun 02 '20 at 02:49

jterrace · Answer 2 · 2018-07-25T23:13:53.033

49

NetworkX is an awesome Python graph library. You'll be hard pressed to find something you need that it doesn't already do.

And it's open source so you can see how they implemented their algorithms. You can also add additional algorithms.

https://github.com/networkx/networkx/tree/master/networkx/algorithms

edited Jul 25 '18 at 23:13

answered Oct 20 '13 at 00:15

jterrace

64,866
22
157
202

7

That's why NetworkX is a fantastic resource. It's open source so you can see how they implemented their algorithms. You can also add additional algorithms. – jterrace Oct 24 '13 at 22:29
2

About 2000 lines of code for the `graph.py --> class Graph`. And all I want to see is how they use `__iter__`. – T.Woody Oct 15 '18 at 22:46

pepr · Answer 3 · 2013-10-26T17:47:20.630

First, the choice of classical list vs. matrix representations depends on the purpose (on what do you want to do with the representation). The well-known problems and algorithms are related to the choice. The choice of the abstract representation kind of dictates how it should be implemented.

Second, the question is whether the vertices and edges should be expressed only in terms of existence, or whether they carry some extra information.

From Python built-in data types point-of-view, any value contained elsewhere is expressed as a (hidden) reference to the target object. If it is a variable (i.e. named reference), then the name and the reference is always stored in (an internal) dictionary. If you do not need names, then the reference can be stored in your own container -- here probably Python list will always be used for the list as abstraction.

Python list is implemented as a dynamic array of references, Python tuple is implemented as static array of references with constant content (the value of references cannot be changed). Because of that they can be easily indexed. This way, the list can be used also for implementation of matrices.

Another way to represent matrices are the arrays implemented by the standard module array -- more constrained with respect to the stored type, homogeneous value. The elements store the value directly. (The list stores the references to the value objects instead). This way, it is more memory efficient and also the access to the value is faster.

Sometimes, you may find useful even more restricted representation like bytearray.

score 8 · Answer 4 · edited Feb 10 '19 at 17:23

8

There are two excellent graph libraries NetworkX and igraph. You can find both library source codes on GitHub. You can always see how the functions are written. But I prefer NetworkX because its easy to understand.
See their codes to know how they make the functions. You will get multiple ideas and then can choose how you want to make a graph using data structures.

edited Feb 10 '19 at 17:23

devsaw

1,007
2
14
28

answered Aug 19 '17 at 15:36

Vineet Jain

1,515
4
21
31

Representing graphs (data structure) in Python

4 Answers4

Linked