adding edges to unweighted directed graphs efficiency?

Question

I have three graphs represented as python dictionaries

A: {1:[2], 2:[1,3], 3:[]}. 

B: {1: {neighbours:[2]}, 2: {neighbours:[1,3]}, 3: {neighbours:[]}}

C: {1: {2:None}, 2: {1:None, 3:None}, 3: {}}

I have a hasEdge and addEdge function

def addEdge(self, source, target):

assert self.hasNode(source) and self.hasNode(target)
if not self.hasEdge(source, target):
    self.graph[source][target] = None

def hasEdge(self, source, target):

    assert self.hasNode(source) and self.hasNode(target)
    return target in self.graph[source]

I am not sure which structures will be most efficient for each function, my immediate thought is the first will be the most efficient for adding a edge and the C will be the most efficient for returning if it has an edge

Also see here http://stackoverflow.com/questions/2218322/what-is-better-adjacency-lists-or-adjacency-matrices-for-graph-problems-in-c — marisbest2, Mar 22 '17 at 22:55

Anish Goyal · Answer 1 · 2017-03-22T22:46:40.597

1

C seems to be the most efficient to me, since you are doing lookups that are on average O(1). (Note that this is the average case, not the worst case.) With Adjacency Lists, you have worst case Linear Search.

For a sparse graph, you may wish to use Adjacency Lists (A), as they will take up less space. However, for a dense graph, option C should be the most efficient.

A and B will have very similar runtimes - asymptotically the same. Unless there is data besides neighbors that you wish to add to these nodes, I would choose A.

I am not familiar with python; however, for Java, option C can be improved by using a HashSet (set) which would reduce your space requirements. Runtime would be the same as using a HashMap, but sets do not store values - only keys, which is what you want for checking if there is an edge between two nodes.

So, to clarify:

For runtime, choose C. You will have average case O(1) edge adds. To improve C in order to consume less memory, use sets instead of maps, so you do not have to allocate space for values.

For memory, choose A if you have a sparse graph. You'll save a good amount of memory, and won't lose too much in terms of runtime. For reference, sparse is when nodes don't have too many neighbors; for example, when each node has about 2 neighbors in a graph with 20 nodes.

edited Mar 22 '17 at 22:46

answered Mar 22 '17 at 22:43

Anish Goyal

2,799
12
17

would you choose A as being most efficient for each of the function? Is that what u mean? – bighead Mar 22 '17 at 22:45
No, I would choose A over B. The real question is if you should choose A over C, or C over A. For a graph with many edges between nodes, C should be more beneficial since your runtime will improve. For a sparse graph, A would be more beneficial since it requires less space, and C will not improve the efficiency enough to offset the memory benefit of A. – Anish Goyal Mar 22 '17 at 22:48
If you choose A, especially for Python, you'd be wrong. Even a TreeMap/TreeSet would be better than an adjacency list. `O(N)` adjacency lists are never the right answer. In a complete or very dense graph, an adjacency matrix could sometimes be the right choice if the constraint is memory The question of whether to focus on space or time is not about the sparseness of the graph. Its about what the constraint is. If the constraint is time, O(1) lists are the way to go – marisbest2 Mar 22 '17 at 22:51
You're right about runtime, but our only concern is not runtime. Sets use up more space. If each node only has a couple of neighbors, i.e. the graph is really sparse, A is more efficient. O(n) is the worst case in either case, since lookups in a set are not guaranteed to be constant time. The advantage of a set is its average case, at O(1). With a sparse graph, adjacency lists have a similar average case to sets. – Anish Goyal Mar 22 '17 at 22:53
True. Fair. I ran some quick size tests: 10**i, getsizeof(list(range(10**i))), getsizeof(set(range(10**i)))) 1 208 744 2 1016 8424 3 9120 33000 4 90120 524520 5 900120 4194536 6 9000120 33554664 7 90000120 268435688 – marisbest2 Mar 22 '17 at 23:01
1

That didnt format well... Anyway, it seems to be a 4-6x difference. Not too bad IMO but obviously it depends on the constraint. However, given that the constraint was about `hasEdge` and `addEdge` I'd expect the teacher to expect `C` as the answer. – marisbest2 Mar 22 '17 at 23:03
1

Fair point. I do agree with everything you've said, I just wanted to point out that there is a real tradeoff that can be made in this situation depending on the data you are working with while answering the question. The best answer for runtime though, would actually be none of our answers - it would be the adjacency matrix. It's just that the space requirement makes it infeasible in most cases beyond the most trivial. – Anish Goyal Mar 22 '17 at 23:06

score 1 · Answer 2 · answered Mar 22 '17 at 22:44

1

A and B are classic adjacency lists. C is an adjacency list, but uses an O(1) structure instead of an O(N) structure for the list. But really, you should use D, the adjacency set.

In Python set.contains(s) is an O(1) operation.

So we can do

graph = { 1: set([2]), 2: set([1, 3], 3: set() }

Then our addEdge(from, to) is

graph[from].add(to)
graph[to].add(from)

and our hasEdge(from,to) is just

to in graph[from]

answered Mar 22 '17 at 22:44

marisbest2

1,346
2
17
30

Note that a `set` is equivalent to a `dict` where all the values are `None` or some other value that we don't care about. `C` is a good choice if the edges have weights or other values. But the Python `set` should take up less space than a dict – marisbest2 Mar 22 '17 at 22:46
I need to state which graph is most efficient for each of the functions, are you saying that C would be the quickest as it has a O(1) structure? – bighead Mar 22 '17 at 22:47
yes. A python dictionary is a hash map and as such has `O(1)` lookup. – marisbest2 Mar 22 '17 at 22:48

adding edges to unweighted directed graphs efficiency?

2 Answers2