10

Recently, I've read a document of the Kademlia Protocol, I tried to understand the protocol, but I still have some question: Why a node must find another node when he knows its ID but ip or port? Why he has the ID while he doesn't know the ip or port, where did he get the ID? I think the "distance" between two different nodes is not a routing distance or real distance, it's only a virtual distance that can be used the algorithm to find the node quickly, it's that right?

Maybe my English is not very clear because English is not my mother tongue, but I'll try to express myself clear if you need. Thanks very much!

Dara Tith
  • 518
  • 7
  • 21
rock_cloud
  • 123
  • 1
  • 1
  • 7

2 Answers2

21

As cHao said, the distributed nature of the network means that nodes need to publish their IDs and their contact details to other nodes they talk to. There is no central place where IDs are mapped to contact info, so each node must keep this mapping for a subset of the nodes on the network in its own routing table.

Kademlia routing tables are structured so that nodes will have detailed knowledge of the network close to them, and exponentially decreasing knowledge further away.

The use of bitwise XOR as a measure of notional distance between IDs has the advantage that for a given target ID, no two IDs can have the same distance to the target.

Imagine a simple example where the IDs are in the range 00 to 63. If Kademlia used e.g. pure mathematical difference as a measure of distance, 15 and 35 would be the same distance to 25 - both would have a distance of 10. Using XOR, the distance between 15 and 25 is 22, and between 25 and 35 it's 58.

In this way, the group of k closest IDs to a target ID can be calculated unambiguously.

The constant k has a couple of uses in Kademlia, but it's primarily the replication factor. In other words, a piece of data is stored on the k closest nodes to the data's ID.

The lookup process is designed to return either a group of k nodes (before storing data on each of them) or return a single piece of data (from the first node holding it during the lookup iterations).

Because of this, pure Kademlia isn't best suited to finding just a single node, so I'm not sure that part of your question is too relevant. If you did want to use Kademlia to find a single node, it would probably be worth modifying the lookup process to finish early as soon as any node returns the target node's contact details (in the same way that the lookup finishes early if a target value is found during the process).

Fraser
  • 74,704
  • 20
  • 238
  • 215
9

Since the network is distributed, by definition, there's no one master table of ID->address mappings. Nodes don't have to (and usually don't) know about all the other nodes. The process of "finding" a node is basically to ask known nodes "closest" to the target not so much about the target node directly, but about what nodes are closer to the target. The result of that query gives you the next group of nodes to query, and the process repeats -- and because a node would return results that are closer than it is, each iteration tends to find nodes closer and closer to the target til you finally reach a node that can say "Oh, node X? He's right over there."

At least that's what i'm understanding of it.

cHao
  • 84,970
  • 20
  • 145
  • 172
  • 1
    Thanks for your quickly responses, but I want to know Why I have to find node X, and where I got the X's ID or name? And what is the real meaning of the "distance" between two nodes? Is it because X has the file which I want? – rock_cloud Feb 16 '12 at 06:28
  • Far as i understand it, the "distance" between two nodes is simply an XOR of their IDs. It also seems node IDs and IDs of "values" (ie: content, files,lookup info, whatever) share the same keyspace, and one of the points of finding the "closest" nodes to a key is to tell them to store the info. Finding a value works the same as finding a node, except that if a node has the value corresponding to the ID, it responds with that instead of a list of nodes. – cHao Feb 16 '12 at 07:04
  • So the Distance is only a value to use the quick search algorithm. Your answer is helpful, thanks! – rock_cloud Feb 16 '12 at 07:13