6

I want to confirm my understanding of buckets in Kademlia DHT. Kademlia has m k-buckets where m is the size of the network in bits and k is the number of key-value pairs stored per bucket. for example, let's say m=4 then we can have 2^4 nodes, namely from 0 to 15.

+========+
| NodeId |
+========+
|   0000 |
+--------+
|   0001 |
+--------+
|   0010 |
+--------+
|   0011 |
+--------+
|   0100 |
+--------+
|   0101 |
+--------+
|   0110 |
+--------+
|   0111 |
+--------+
|   1000 |
+--------+
|   1001 |
+--------+
|   1010 |
+--------+
|   1011 |
+--------+
|   1100 |
+--------+
|   1101 |
+--------+
|   1110 |
+--------+
|   1111 |
+--------+

Each node has the routing table of the 0-bit match, 1st-bit match and 2nd-bit match and so on, this is m buckets. Furthermore, for each bucket, it will store k representatives instead of a single NodeId. So, if we say k=2, the routing Table for node 0101 would be something like:

┌──────────────────────┐
│         0101         │
├──────────────────────┤
|                      |
| +==================+ |
| |       xxxx       | |
| +==================+ |
| |   1001, <value>  | |
| +------------------+ |
| |   1010, <value>  | |
| +------------------+ |
|                      |
| +==================+ |
| |       0xxx       | |
| +==================+ |
| |   0000, <value>  | |
| +------------------+ |
| |   0111, <value>  | |
| +------------------+ |
|                      |
| +==================+ |
| |       01xx       | |
| +==================+ |
| |   0110, <value>  | |
| +------------------+ |
| |   0111, <value>  | |
| +------------------+ |
|          .           |
|          .           |
|          .           |
└──────────────────────┘

Is my assumption correct?

Nawras
  • 181
  • 1
  • 12
  • No, *k* is the number of nodes in a bucket in the routing table. A node can store large numbers of *key, value* pairs. – Encombe Jan 24 '19 at 11:26
  • but each _key_ is a nodeId, so the number of nodes and number of keys are the same thing. Can you explain it more please? – Nawras Jan 25 '19 at 10:33

2 Answers2

2

k is the number of entries in a bucket. Their node IDs are expected to be randomly distributed within the ID-range the bucket covers, which means doubling the number of entries per bucket would only increase its resolution by one bit on average, i.e. it does not scale well. That's why we have a structured routing table with multiple buckets with different scope each. It increases local resolution around the node's own node ID.

Implementing the full kademlia algorithm requires a dynamic routing table layout. Therefore m is not fixed. The fixed size layout was only used in the simplified pre-print version of the paper as part of a theoretical proof.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • So my diagram is correct, right? for each bucket, there should be _k_ number of entries and these entries are all responsible for the same key (replication). – Nawras Jan 28 '19 at 02:03
  • Buckets are not responsible for keys. You need to perform an iterative find_node lookup, which involves querying additional nodes, to find nodes responsible for a key. – the8472 Jan 29 '19 at 07:12
  • What does "m is not fixed" mean? The paper states close distance buckets are generally empty, so I suppose the amount of m-buckets depend on the node ID bit length AND on the amount of nodes present in the network. (e.g. 30 nodes vs. 10.000 nodes). – Marcellvs May 29 '20 at 12:22
  • @Marcellvs there may be some confusion here. The question is conflating storage with the routing table and the global network size with a local routing table. So the meaning of the `m` we're talking about may be overloaded. The expected number of populated buckets in your local routing table does correlate with the global network size. – the8472 May 30 '20 at 13:19
1

Every node in Kademlia store a list of other nodes. This list is based on bit-differences and is loosely termed as buckets. Now, 'k' is a system wide replication parameter. This means that for every list, there are k entries inside for that 'bucket'. This is my understanding. This is the link to the paper. Hope this helps..:)