2

I have a Hamming cube, of general dimension, but in practice, usually, the dimension ranges from 3 to 6.

The search algorithm is:

Input: any vertex, `v`.
Find all vertices that lie in Hamming distance 1 from `v`.
Find all vertices that lie in Hamming distance 2 from `v`.
...

I do not know in advance how far away from v will I need to go. I might stop at distance 1 for example.

For instance, given this cube:

enter image description here

and v = 100, I would need to the vertices at Hamming distance 1, which are 000, 101, 110 (at any order). Then, I might need to get those in distance 2, namely 111, 001, 010. In the unlikely event of needing the vertices at distance 3 too, I Would get 011 as well.

A vertex of the cube may contain IDs (integers).

Which would be an appropriate Data structure to store this cube and efficiently search it? I am not really interested in other operations.

I thought about sorting all the bit sequences somehow, so that I can easily access them, but didn't get anything to work.


My approach so far:

Use a hashtable (specifically std::unordered_map), where the keys are the vertices and the values are the IDs.

Given a vertex v, generate all sequences of bits within Hamming distance t (i.e. 1, 2, ...).

However, this requires me to call a function every time a vertex varrives (which often happens). I have a function to achieve this, based on this.

Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • Have you thought about representing the cube as a graph? Particularly an adjacency list. Then you could use depth-limited search to find your selected nodes. – Arya McCarthy May 19 '17 at 07:51
  • @aryamccarthy no. Graphs are notorious for their data locality, aren't they? However, if you have a complete thought in mind, please post an answer, and I will definitely go though it! – gsamaras May 19 '17 at 07:54

1 Answers1

1

I'm rusty with C++, so I'll keep this abstract.

The neighbors of a given point of your Hamming cube are easily computable. Given a vertex's bit sequence, flip each bit individually.

You could precompute that, though. You could cache the results of your neighbors() function, or you could save them to an array. Each vertex would have its own neighbors, so you have one array for each vertex. That gives you, essentially, your adjacency list.

With that adjacency list, you can search your Hamming cube using depth-limited search, a variant of DFS (or BFS, I guess—but space complexity is worse) that only goes k units deep.


Your data structure is a good choice, but consider that your vertices are binary strings, so they cover all points from 0 to 2^n - 1. You might as well just use an array—lookup will still be O(1), and it'll be more compact because there aren't unused buckets sitting around.

Arya McCarthy
  • 8,554
  • 4
  • 34
  • 56
  • Hmm nice idea +1. But wouldn't that be big? I mean every vertex has as neighbors *all* the other vertices, in different distances. (thanks for the upvote BTW) – gsamaras May 19 '17 at 08:36
  • You have a tradeoff between time and space—spend time recalculating or spend space caching. In six dimensions, each vertex has six (immediate) neighbors, each of which is representable as a byte. Sixty-four nodes, so the adjacency list takes up <400 bytes. – Arya McCarthy May 19 '17 at 08:46
  • And in an adjacency list, you only store *immediate* neighbors. The rest are found by the depth-limited search. Otherwise, yes, you'd have an exponential explosion. – Arya McCarthy May 19 '17 at 08:47
  • By immediate you mean the ones that are located in distance 1? Yeah that exponential bomb is what I am afraid! – gsamaras May 19 '17 at 08:49
  • OK good, maybe you could improve your answer by updating it with an example or something? Since it might unclear to future readers. Notice that I will wait for more ideas to come (if any) and then accept your answer. – gsamaras May 19 '17 at 08:50