4

Graphs are often represented using an adjacency matrix. Various sources indicate it is possible to avoid the cost of initialization to be |V^2| (V is the number of vertices) but could I have not figured out how.

In Java, simply by declaring the matrix, e.g. boolean adj [][], the runtime will automatically initialize the array with false and this will be at O(V^2) cost (the dimensions of the array).
Do I misunderstand? Is it possible to avoid the quadratic cost of initialization of the adjacency matrix, or is this just something theoretical that depends on the implementation language?

andersoj
  • 22,406
  • 7
  • 62
  • 73
Cratylus
  • 52,998
  • 69
  • 209
  • 339

4 Answers4

2

That would be possible by using a sparse matrix representation of an adjacency matrix where only the position of the "ones" is allocated rather than each and every element of the matrix (which might include a large number of zeros). You might find this thread useful as well

Community
  • 1
  • 1
A_A
  • 2,326
  • 18
  • 27
2

The default initialization of the matrix's values is in fact a feature. Were it not with the default initialization, wouldn't you still need to initialize every field yourself so you know what to expect its value to be?

Adjacency matrices have this drawback: they are bad in the sense of memory efficiency (they require O(n2) memory cells) and as you said their initialization is slower. The initialization, however, is never considered the biggest problem. Believe me, the memory allocation is a lot slower and the needed memory is much more limiting than the initialization time.


In many cases people prefer using adjacency lists, instead of the matrix. Such list require O(m) memory, where m is the number of edges in the graph. This is a lot more efficient, especially for sparse graphs. The only operations this graph representation is worse than the adjacency matrix is the query is there edge between vertices i and j. the matrix answers in O(1) time and the list will for sure be a lot slower.

However many of the typical graph algorithms (like Dijkstra, Bellman-Ford, Prim, Tarjan, BFS and DFS) will only need to iterate all the neighbours of a given vertex. All these algorithms benefit immensely if you use adjacency list instead of matrix.

Boris Strandjev
  • 46,145
  • 15
  • 108
  • 135
  • So you are saying that there is not some kind of "trick" to initialize the array in less than quadratic time? Even in theory? – Cratylus May 13 '12 at 11:21
  • @user384706 Nope there isn't and theoretically there can not be. However, there are some functions that used highly optimized operations like the c++'s `memset` function. – Boris Strandjev May 13 '12 at 11:30
  • Well they're only bad in the sense of memory efficiency for certain cases. I know what you're trying to say, but it's important to know that matrices can be memory efficient. See [this thread](http://stackoverflow.com/questions/2218322/what-is-better-adjacency-lists-or-adjacency-matrix-for-graph-problems-in-c/5419933#5419933) for more info – keyser May 13 '12 at 11:58
  • Parallel algorithms are usually a whole lot easier to parallelize if an algorithm exists at all, for a simple 2d array than with the adjacency list. Note that most listed algorithms on your list are pretty hard to parallelize work efficient. Ah all the pros and cons. – Voo May 13 '12 at 19:21
2

There is a good deal of confusion and misinformation in this thread. In fact, there is a method of avoiding initialization costs of adjacency matrices (and any array in general). However, it is not possible to use the method with Java primitives since they are initialized with default values under the hood.

Suppose you could create an array data[0..n] that is not auto-initialized. To start, it is filled with junk from whatever was previously in memory. If we don't want to spend O(n) time overwriting it, we need some way to differentiate the good data we add from junk data.

The "trick" is to use an auxiliary stack that tracks cells containing good data. The first time we write to data[i], we add index i to the stack. Since a stack only grows as we add to it, it never contains any junk we need to worry about.

Now whenever we access data[k], we can check if its junk or not by scanning the stack for k. But that would take O(n) time for each read, defeating the point of an array in the first place!

To solve this, we make another auxiliary array stack_check[0..n] that also starts full of junk. This array contains pointers to elements in the stack. Now when we first write to data[i], we push i onto the stack and set stack_check[i] to point to the new stack element.

If data[k] is good data, then stack_check[k] points to a stack element holding k. If data[k] is junk, then the junk value of stack_check[k] either points outside of the stack or points to some stack element besides k (since k was never put on the stack). Checking this property only takes O(1) time so our array access is still fast.

Bringing it all together, we can create our array and helper structures in O(1) time by letting them be full of junk. On each read and write, we check if the given cell contains junk in O(1) time using our helpers. If we are writing over junk, we update our helper structures to mark the cell as valid data. If we read junk, we can handle it in whatever way is appropriate for the given problem. For example, we could return a default value like 0 (now you can't even tell we didn't initialize it!) or maybe throw an exception.

0

I'll elaborate on A_A's answer. He recommends a sparse matrix, which basically means you're back to maintaining adjacency lists.

You have two reasons to use a matrix - if you don't care about performance at all and like the simple code it offers, or if you do care about performance but your matrix is going to be relatively full (let's say at least 20% full, for the sake of this post).

You obviously do care about performance. If your matrix is going to be relatively empty, its initialization overhead can be meaningful, and you're better off using adjacency lists. If it's going to be quite full, initialization becomes negligible - you'll need to fill the right cells in the matrix (which will take more than initializing it), and you need to process them (which, again, will take more time than initializing it).

zmbq
  • 38,013
  • 14
  • 101
  • 171