1

It said:

We start with some definitions. For a k-bucket covering the distance range 2i,2i+1 , define the index of the bucket to be i. Define the depth, h, of a node to be 160 − i, where i is the smallest index of a non-empty bucket. Define node y’s bucket height in node x to be the index of the bucket into which x would insert y minus the index of x’s least significant empty bucket. Because node IDs are randomly chosen, it follows that highly non-uniform distributions are unlikely. Thus with overwhelming probability the height of a any given node will be within a constant of log n for a system with n nodes. Moreover, the bucket height of the closest node to an ID in the kth-closest node will likely be within a constant of log k.

I can understand the definition of bucket height, but I don't know why we need that definition, and I don't understand the last sentence of the paragraph.


Updates: I also think that the paper has a typo: the bucket height should be the index of the bucket containing y minus the index of x’s least significant "NON-"empty bucket. Am I wrong?

Adrian Liu
  • 385
  • 3
  • 13

1 Answers1

1

but I don't know why we need that definition

The argument for O(log n) efficiency of kademlia in terms of routing table size and lookup steps is based on mapping the entire keyspace of n nodes into k-buckets where further-away buckets cover exponentially larger fractions of the keyspace. Effectively compressing the whole network into a biased list of samples.

Then arguments further down are then based on this bucket-based projection.

Moreover, the bucket height of the closest node to an ID in the kth-closest node will likely be within a constant of log k.

I think this is a convoluted way of saying that your k nearest neighbors will all end up in or near the same bucket, i.e. the deepest one (smallest index non-empty bucket).

Note that this is expressed in terms of the flat layout, in the tree layout the smallest bucket would be akin to but not necessarily identical with the the own-ID-covering bucket.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • I understand the k-buckets layout, but I don't see how the bucket height definition works here. For example, node y's bucket height is 10, which is possible for any k. I can't see how this can be related to k. – Adrian Liu Mar 08 '19 at 09:52
  • the home bucket is essentially exhaustive. doubling k means 1 bit less depth for it to be exhaustive, thus lowering the depth of the home bucket and thus the height any bucket can have relative to it. – the8472 Mar 08 '19 at 22:35
  • I think you might be referring to the kind of "dynamic splitting" bucket. But here the bucket i is not related to k, it is just defined by the distance. So node y's bucket will be 10 if it's distance to node x is between 1024 and 2048 regardless what k is. – Adrian Liu Mar 11 '19 at 03:33
  • you're right, this is potentially a different k than the bucket size. Maybe what they're saying is that since the height grows lograthmically the neighbors will all be clustered around the deepest bucket and you have to expotentiate the distance to cover a ~constant distance in the routing table, i.e. it exhibits locality. Which is a prerequisite for their following, a bit stricter assumption. The wild overloading of variables from one sentence to the next is slightly vexing. – the8472 Mar 11 '19 at 20:56
  • Ah now I get your point, and the k is not the bucket size but the k closest nodes to a given node x. – Adrian Liu Mar 15 '19 at 08:17
  • I updated my question as I think the authors of the paper had a typo in the definition of bucket height. Could you please elaborate? Thanks. – Adrian Liu Mar 15 '19 at 09:53
  • Possibly it's a typo. Either the least significant non-empty or the most significant empty bucket in the distance space defines the border since those regions should be approximately contiguous. Unless you invert the distance, then you have to flip the conditions. They go back and forth between distance and inverse distance several times. – the8472 Mar 15 '19 at 16:16