The problem: Storing dynamic adjacency list of a graph in a file while retaining O(1) algorithmic complexity of operations.
I am trying to store dynamic bidirectional graph in a file(s). Both nodes and edges can be added and removed and the operations must be O(1). My current design is:
File 1 - Nodes
Stores two integers per node (inserts are appends and removals use free list):
- number of incoming edges
- number of outgoing edges
File 2 - Edges
Stores 4 integers per edge (inserts are appends and removals use free list + swap with last edge for a node to update its new index):
- from node (indice to
File 1
) - from index (i.e. third incoming edge)
- to node (indice to
File 1
) - to index (i.e. second outgoing edge).
File 3 - Links
Serves as openly addressed hash table of locations of edges in File 2. Basically when you read a node from File 1
you know there are x
incoming edges and y
outgoing edges. With that you can go to File 3
to get the position of each of these edges in File 2
. The key thus being:
- index of node in
File 1
(i.e. 0 for first node, 1 for second node) - 0 <= index of edge < number of outgoing/incoming edges
Example of File 3
keys if represented as chained hash table (that is unfortunately not suitable for files but would not require hashing...):
Keys (indices from `File 1` + 0 <= index < number of edgesfrom `File 1`, not actually stored)
1 | 0 1 2
2 | 0 1
3 |
4 | 0
5 | 0 1 2
I am using qHash
and QPair
to hash these atm however the number of collisions is very high. Especially when I compare it to single int
hashing that is very efficient with qHash
. Since the values stored are indices to yet another file probing is rather expensive so I would like to cut the number of collissions down.
Is there a specialized hashing algorithm or approach to use for pair of ints
that could perform better in this situation? Or of course a different approach that would avoid this problem like how to implement chained hash table in a file for example (I can only think of using buffers but that would be overkill for sparse graphs like mine I believe)?