6

In Graph Theory, we know that an Vertex adjacencies can be represented using Adjacency list data structure. On contrary, adjacency set is not widely mentioned anywhere in the Graph Theory. Why is that so?

Here's the pros, i can think of.

  1. As Set property, the graph could provide a guarantee in term of duplicated edge and many other properties of Set. Moreover all set operation from the Set Theory became available which is more intuitive to work with the analysis. Such as:

    • vertex_set_A | vertex_setB is the union operation.
    • vertex_set_A & vertex_set_B, is the intersection operation.
  2. *Opinion, Set is more understandable as it has correlation in the mathematical proofing. It is also provide a good abstraction of how the low level code dealing with array and stuff.

  3. In term of Performance, one could use HashSet implementation which would provide a constant time operation. Or TreeSet when the graph need to dynamically changed frequently on log time operation.
  4. List data structure also maintain the ordering property of the element which serve no purpose in most graph. In fact, list iterates in ordered fashion which should not happened in the first place. Indexed Ordered should not matter and Set could provide that. The only time when ordering mattered is when the graph is weighted, and thus the ordering based on the weight, in which TreeSet operate mostly in log time operation.

So, I am unsure why most graph algorithms only mentioning adjacency list. Is it because of the technology barrier where Set is harder to implement, whereas List is easier?

Yeo
  • 11,416
  • 6
  • 63
  • 90
  • An [adjacency matrix](https://en.wikipedia.org/wiki/Adjacency_matrix) is also quite common. The advantage of a list is that it can also represent duplicate edges. That would not be possible in a set. – Axel Kemper Aug 31 '16 at 21:51
  • @AxelKemper, yeah, i know that, but in graph algorithms, **in general** adjacency matrix is the worst data structure to be used. (i'm referring to general classical algorithm mostly). http://stackoverflow.com/questions/2218322/what-is-better-adjacency-lists-or-adjacency-matrices-for-graph-problems-in-c – Yeo Aug 31 '16 at 21:53
  • Your question might lead to opinion-based answers. What is a "good" and what is a "bad" data structure? What are your criteria? In terms of performance and simplicity, a matrix has its merits. The drawback is the space consumption. Lists are readily available and probably easier to implement than sets. – Axel Kemper Aug 31 '16 at 21:56
  • @AxelKemper why would you represent duplicate edges in the first place. In Set, 2 edges having the same endpoint should not be considered as duplicate, but rather a separate entities – Yeo Aug 31 '16 at 21:57
  • 1
    There is no specific reason to use a list instead of a set. However, there are a few arguments for the list. A list can be implemented in a more compact way than a set (in terms of memory layout). Most notably, it offers memory coherency. Iterating a simple list is faster than iterating a set (the most common operation). For most applications, the features you mention just have no relevance. Something like set operations can still be implemented with an auxiliary set (which you need anyway for the result). In fact, the list may be seen as an implementation of a mathematical set in most cases. – Nico Schertler Aug 31 '16 at 21:57
  • @NicoSchertler so you mean, implementation wise, there are some people are using Set for its Adjacency list? – Yeo Aug 31 '16 at 21:59
  • No, the other way around. In a very abstract (mathematical) view, you may define the neighbors as a set but implement them with a list. Duplicate edges are very application-specific. There are quite a few applications that need duplicate edges (e.g. when you collapse nodes and several edges fall together). – Nico Schertler Aug 31 '16 at 22:01
  • I just found a duplicate question: http://stackoverflow.com/questions/32588468/why-are-graphs-represented-using-an-adjacency-list-instead-of-an-adjacency-set . Somewhat unpopular question, so hard to search – Yeo Aug 31 '16 at 22:08
  • Inspiring thought...But can Set represent directed graph or multigraph? – shole Sep 01 '16 at 08:53
  • @shole the `Set` is not to represent the Edge data structure. internally, for the Edge data structure itself, it is best to simply use 2 variables, or an array of size 2, or a Pair data structure. Those will maintain the order which directed edge requires. And for multi graph, each parallel edge should be treated as separate entity instance, that's depends on how the set is implemented for its uniqueness, either each edge is given a unique identification or something. – Yeo Sep 01 '16 at 09:56
  • @shole but for other parts in Graph Data Structure, the Graph need to maintain at least a collection of Edges or Vertices which naturally they should be unique regardless whether they are parallel edge or not. (for parallel edge, the 2 endpoint should not be the only component for uniqueness check, some edge unique identifier may required) – Yeo Sep 01 '16 at 10:00
  • @Yeo Got it, great point. But isn't then your set graph data structure similar to edge list? – shole Sep 02 '16 at 07:18
  • @shole my example in the comment, is not about adjacency list, but was talking about example of set in graph. So yes, collection of edge makes the graph to be **edge set** data structure. And collection of vertices with its adjacencies makes the graph to be **adjacency list** data structure. – Yeo Sep 02 '16 at 09:14
  • @Yeo What I meant is that, edge set is itself already representing the graph, it is an existing graph structure and I thought it is behaving quite similar what you describe in your OP...sorry for my bad elaboration – shole Sep 05 '16 at 03:04
  • I know this thread has gone pretty dry, but one issue with using Sets in some languages (like Python say), is that often people have a "custom" Node/Vertex class. So when using sets, they would also have to implement a hashing function to describe how to hash a Node since set items must be hashable. This isn't difficult to do, one could simply have a __hash__() method for the Node class which hashes the node id, but it might deter people away from considering a set implementation – Joel Biffin Apr 17 '19 at 09:13

1 Answers1

2

This is indeed a very good question, the fact that there isn't a comprehensive answer yet just iterates the question one more time, 'why not' indeed.

The reasons in the comments to me seem more like make-do excuses in that this is just what has existed throughout history, still not enough to warrant an actual reason.

List is merely a general label and you could(and should) use sets if that's better suited to your task.

Some people would argue the set doesn't provide you a guaranteed O(1) lookup time - it's amortized and that the worst case is still O(n) even if very unlikely, others would reason faster iteration via lists, and then there's the argument about implementation feasibility.

While they're technically not wrong, I don't really buy into any of those being the primary reasons.
I feel the real reason is that they're used 'just because convention'.
The general label was 'list' and it was taken literally often enough to lose its genericness.

Certainly go ahead and use a set if your application lends itself to it.
My textbook used them too.

(Oh and an example to drive that home when I say 'application lending itself to it'; If you need to find indegrees very often in your application, the set will give you an O(V) runtime where V denotes number of vertices, adjacency list will give you O(E) runtime where E denotes number of edges. For dense graphs, O(E) tends to become O(V^2) assuming no parallel edges are allowed. Thus an adjacency set would give you a better runtime performance here.)

Ayush
  • 1,510
  • 11
  • 27
  • What textbook did you use? – heretoinfinity Jan 31 '21 at 14:03
  • could you clarify what you mean by *implementation feasibility* in your answer? – heretoinfinity Jan 31 '21 at 14:08
  • 1
    @heretoinfinity I was referring to the book 'Parallel Scientific Computation - A Structured Approach Using BSP' but a more approachable example can be found [here](https://www.geeksforgeeks.org/graph-representations-using-set-hash/). About 'implementation feasability', I was referring to the author's remarks about 'technology barrier' and Joel Biffin's comment to the question above. – Ayush Feb 01 '21 at 09:26