bittorrent DHT detailed specification

Question

In my new weekend project I decided to write an bittorrent client from scratch, no ready to use libraries at all. After two days looking for documentation I'm already about to give up :smile:. I know there are the BEPs, but they are far from enough to understand all the specification. After reading a lot more I think the tracker and peer protocols seems to be old and easy to understand/implement (yes, I know, to write a good code with balance, peer selection, optimizations, this is not easy as I just said, but all I want to is to do the basics to learn, not to compete with tens of good clients out there.)

So, I have decided to start by the DHT which seems to the the more complex part and also the less documented. When you stop looking for bittorrent DHT or mainline DHT and start looking for kademlia DHT you have a lot more information but it not so obvious how to put it all together.

Here is what I understand so far (and there are gaps which I hope to fill in):

I start with my DHT tree empty
use find_nodes on my bootstrap node
add the received nodes to my own tree, so I can then select the ones closer to my own ID
start issuing find_nodes to the selected ones and add their responses to my tree
go back to 3 until I stop receiving unknown/new nodes
if I receive an announce_peer with an info_hash than I should save its information on a local DB (the info_hash and ip/port of the sender)
if a node uses get_peers with an info_hash I have in my DB then I send the information otherwise I should send a list of closer nodes I have in my own tree (closest to that info_hash)
when I use get_peers on other nodes I will receive peers or nodes, in the later case I think the nodes are closer to the info_hash and not to my own nodeId so, should I add these nodes to my tree or start a new tree based on them?
when I want to announce I am interested on an info_hash should I use announce_peer everywhere or just to the nodes with nodeId closer to the target info_hash? How much is closer enough?

At this point I have a lot of nodes which IDs are closer to my own ID, and informations about info_hash'es I am not really interested.

I am afraid that I have a giant stupid question: why I did that?

I mean: my selfish reason to do all this work is to locate peers to the info_hash I'm interested in. I understand that the information of one info_hash is likely to be saved on a node which ID is closer to that info_hash. So my chances to find its information is bigger if I create a tree of nodes closer to the info_hash and not closer to my own ID (at this point, if you know the subject, you already noticed how lost I am).

Should I create multiples trees? One for me (to be there to save the information of info_hashes closer to my nodeID people send me), and other tree closer to each one of my target info_hashes so I can retrieve their information?

Should I create a single tree closer to my node ID and hope for the best when querying this tree for the info_hashes I need?

Should I give up since I have completely misunderstood the idea behind DHT at all?

Well, any real documentation, flowcharts, any thing will be welcome!

[This video](http://engineering.bittorrent.com/2013/01/22/bittorrent-tech-talks-dht/) does a good job explaining the basics of the mlDHT. It may help to understand how the BitTorrent DHT works. — Encombe, May 26 '17 at 12:51

the8472 · Accepted Answer · 2017-05-26T19:08:51.397

So, I have decided to start by the DHT which seems to the the more complex part and also the less documented.

The original kademlia paper "Kademlia: A Peer-to-peer Information System Based on the XOR Metric" by Peter Maymounkov and David Mazieres is required reading. It is referenced fairly early in BEP-5

if I receive an announce_peer with an info_hash than I should save its information on a local DB (the info_hash and ip/port of the sender)

You only accept announces when they contain a token previously handed out via get_peers.

when I use get_peers on other nodes I will receive peers or nodes, in the later case I think the nodes are closer to the info_hash and not to my own nodeId so, should I add these nodes to my tree or start a new tree based on them?

You use a temporary tree - or a list ordered by contact-ID relative to the target ID - for iterative lookups since they are not balanced towards your node ID.

when I want to announce I am interested on an info_hash should I use announce_peer everywhere or just to the nodes with nodeId closer to the target info_hash? How much is closer enough?

You perform a get_peers lookup and when it is done you announce to the closest nodes set that returned a write token and verify the responses to make sure you actually get . In case of bittorrent = 8.

my selfish reason to do all this work is to locate peers to the info_hash I'm interested in. I understand that the information of one info_hash is likely to be saved on a node which ID is closer to that info_hash. So my chances to find its information is bigger if I create a tree of nodes closer to the info_hash and not closer to my own ID (at this point, if you know the subject, you already noticed how lost I am).

When doing lookups you do not just visit nodes in your routing table, you also visit nodes included in the responses. This makes them iterative. The bias of each node's routing table towards their own ID ensures that the responses include neighbors closer and closer towards the target.

So the deal is that you are responsible for information close to your node ID and other nodes will provide information close to their node IDs that you are interested in. So your routing table layout serves others, their routing table layout serves you.

Note that all the information contained in this answer can be found in the BEP or Kademlia paper.

now that I spend a few nights studying this, I recognise that you are right and all the information was there, But in a very condensed form when you have no clue what you are looking for (the same is valid for any other information I guess :) ). That idea of you working to collect data to help others and not yourself was the main point of my doubts. Now I got it! Since last night I can help others to route themselfs and find my info_hash on a simple nodejs code which is working fine! Was a good project... now to peers connections... Thank you! — Gustavo Vargas, May 26 '17 at 21:02

score 0 · Answer 2 · answered Jun 27 '23 at 16:45

Just leaving a note of my understanding so far.

Distributing torrent peer contacts

Since all computers in the world cannot hold peer contact information for all the torrents out there, we decided to evenly, yet systematically, spread the peer contact information for torrents throughout the network, so that one DHT (“BitTorrent’s distributed hash table”) node only has to hold some dedicated part of the entire peer contact information.

Nodes are responsible for the peer contacts of torrents with infohashes close to their identifier

The entire nodes in the DHT, including initial seeders, are working together to put the peer contact information of a torrent to where it belongs in the DHT, that is, nodes with identifiers closest, by the XOR distance metric, to the infohash of the torrent. (Note: a torrent’s infohash and DHT node’s identifier have the same length in octets, so they can be XOR’d with each other.)

How to reach where a torrent belongs to get/set its peer contacts

Meaning that, if you’re to register yourself as a peer of a torrent, you iteratively make get_peers queries (Not find_nodes queries because only get_peers query responses allow you to announce yourself as a peer, giving you the write token.) to reach nodes as close to the infohash of the torrent as possible. Those nodes are responsible for storing the peer contact information of the torrent. Once you’ve reached the possibly closest nodes, you announce (“register”) yourself as a peer to those responsible nodes with announce_peer queries.

bittorrent DHT detailed specification

2 Answers2

Distributing torrent peer contacts

Nodes are responsible for the peer contacts of torrents with infohashes close to their identifier

How to reach where a torrent belongs to get/set its peer contacts

Linked