Okay, I've been reading articles and the paper about Kademlia recently to implement a simple p2p program that uses kademlia dht algorithm. And those papers are saying, those 160-bit key in a Kademlia Node is used to identify both nodes (Node ID) and the data (which are stored in a form of tuple).
I'm quite confused on that 'both' part.
As far as my understanding goes, each node in a Kademlia binary tree uniquely represents a client(IP, port) who each holds a list of files.
Here is the general flow on my understanding.
- Client (.exe) gets booted
- Creates a node component
- Newly created node joins the network (bootstrapping)
- Sends find_node(filehash) to k-closest nodes
- Let's say hash is generated by hashing file binary named file1.txt
- Received nodes each finds the queried filehash in its different hash table
- Say, a hash map that has a list of files(File Hash, file location)
- Step 4,5 repeated until the node is found (meanwhile all associated nodes are updating the buckets)
Does this flow look all right?
Additionally, bootstrapping method of Kademlia too confuses me. When the node gets created (user executes the program), It seems like it uses bootstrapping node to fill up the buckets. But then what's bootstrapping node? Is it another process that's always running? What if the bootstrapping node gets turned off?
Can someone help me better understand the concept?
Thanks for the help in advance.