5

I'm trying to understand how Kademlia works in regards to finding a resource. There is pretty good description of now to build a node tree which is closest to the self node, how to find the distance between nodes, how to initiate the process etc. What I don't understand is how the file infohash fits into this picture. All descriptions tell us how to get into the play and build your own part of the distributed hash table but it is not it. We are doing this to actually find a resource, a file with a certain infohash. How it is stored in this node tree or there is a separate one? How is it works to find nodes which have this infohash, consequently having the file.

There is brief mentioning of the fact that the node id and infohash having the same 20 bytes length codes and something that node id XOR infohash is the distance between the node and the resource but I cannot imagine how is that and how it helps to find the resource? After all a node id actually having the resource can have the greatest XOR distance to the resource.

Thank you, Alex

alex.49.98
  • 609
  • 5
  • 13

1 Answers1

4

I recommend that you don't just read the bittorrent DHT specification but also the original kademlia paper, since the former is fairly concise and only mentions some things in passing.

Bittorrent's get_peers lookup is equivalent to the find_value operation described in the paper.

In short: just like you can do an iterative lookup to find the K-closest-node-set - closest based on xor-distance relative to the target key - for your own node's ID you can do so for any other ID.

For get_peers you simply use the infohash as target key.

The K-closest-node-set for a particular infohash is the set of nodes considered responsible to store the data for said infohash. Although due to inaccuracies of implementations and node churn more than K nodes around the target key may be storing data of interest.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • 1
    I still don't understand what is the link between the node id and infohash. Each node in the Kademlia tree is a key-value pair where the key is the node id and the value is IP and port of that node. I understand that. Consequently I can find IP and port of all closest nodes to my node. Where the infohash sits? Is it a key - value pair in our tree where the key is the infohash and the value is ..? (list of node ids announced the infohash?). If it is the case I would understand how to find the infohash in the tree and apparently as soon as I found it I have a list of node ids to query the file. – alex.49.98 Apr 30 '15 at 23:43
  • 1
    Yes, the lists of `` pairs is returned by nodes (if they have any) when you perform a lookup for the target key. And a node is not a key-value-pair. A node is a storage for multiple key-value-pairs where the keys are close to its ID. – the8472 Apr 30 '15 at 23:58