k-buckets datastructure is an implementation detail of bit-torrent protocol so that peers reply quickly enough to FIND_PEERS
and FIND_VALUE
.
What I do in my kademlia implementation is that keep the the routing table in a python dictionary and I compute the nearest peers under the 5 seconds by default which is the timeout I use to wait for a UDP reply. To achieve that I need to keep the routing table below 1 000 000 entries.
Like I said above, the routing table is a simple python dict
that maps peerid
to (address, port)
.
The routing table stores peers not values ie. not infohash
addresses.
When I receive a FIND_PEERS
message, the program replies with the following code:
async def peers(self, address, uid):
"""Remote procedure that returns peers that are near UID"""
log.debug("[%r] find peers uid=%r from %r", self._uid, uid, address)
# XXX: if this takes more than 5 seconds (see RPCProtocol) it
# will timeout in the other side.
uids = nearest(self._replication, self._peers.keys(), uid)
out = [self._peers[x] for x in uids]
return out
When I receive a FIND_VALUE
message, the program replies with the following code:
async def value(self, address, key):
"""Remote procedure that returns the associated value or peers that
are near KEY"""
log.debug("[%r] value key=%r from %r", self._uid, key, address)
out = await lookup(key)
if out is None:
# value is not found, reply with peers that are near KEY
out = nearest(self._replication, self._peers.keys(), uid)
return (b"PEERS", out)
else:
return (b"VALUE", out)
Here is the definition of nearest
:
def nearest(k, peers, uid):
"""Return K nearest to to UID peers in PEERS according to XOR"""
# XXX: It only works with len(peers) < 10^6 more than that count
# of peers and the time it takes to compute the nearest peers will
# timeout after 5 seconds on the other side. See RPCProtocol and
# Peer.peers.
return nsmallest(k, peers, key=functools.partial(operator.xor, uid))
That is it sorts the peers
according to their peerid
and return the k
smallest. nsmallest
is supposed to be an optimized version of sort(peers, key=functools.partial(operator.xor, uid))[:k]
where uid
is a peerid
or infohash
(respectively FIND_PEERS
and FIND_VALUE
).
Now back at your question(s):
Is hashinfo equivalent to peer ID in Mainline DHT?
hashinfo
is a hash-something that is the same kind of hash as peerid
ie. they are possible keys in the routing table. That is, torrent files are associated with a hash, peers are associated with the same kind of hash called peerid
. And peers have the "ownership" of keys that near their peerid
. BUT hashinfo
are not stored in the routing table or k-buckets if you prefer. hashinfo
are stored in another mapping that associate hashinfo
hash with their value(s).
I am asking because this is not just implementation nuance. Because "k" is normally 20 or some other integer in all clients. So if I would use k-buckets to store torrent files as full-right members, I would have less space to store real peers data.
There is mis-understanding here about the same thing I try explain above. hashinfo
are keys in storage dictionary. Whereas peerid
are keys in the routing table aka. k-buckets datastructure. They both have the same format because that is how the kademlia routing algorithm work. You must be able to compare hashinfo
with peerid
with xor
to be able to tell which peers "own" which hashinfo
value.
As you can see in the second snippet, when a another peer ask for a value associated with an hash, it calls lookup(key)
that is something like storage.get(key)
except in my case the code stores values in a database.
Maybe there is a mis-understanding about the fact that k-buckets are used to store DHT routing information. And that on top of that, torrent protocol use the DHT to store torrent routing information.
For what it is worth, qadom's peer.py file is where I implement a DHT inspired from kademlia (except I use 256 bits hash and forgo alpha
and k
parameters and use a single REPLICATION
parameter). The code works most of the time check the tests.
Also, there is another project I got inspiration from called simply kademlia that (try to?) implement k-buckets.
As far as I understand, torrent DHT routing looks like qadom bag
functions except the receiving peer must authenticate the announcement, whereas in qadom bags are free-for-all.
Also, check the original kademlia paper.