0

I've built a DHT crawler reading/using BEP 5 from the bittorrent standard. With it I can gather infohashes from torrents and query DHT nodes for peers. Thus said, having:

  1. A torrent infohash.
  2. Current torrent peers.

How can I download the torrent?

BEP 9 suggest a magnet schema in the following form (omitting the tracker and name part):

magnet:?xt=urn:btih:<info-hash>&dn=<name>&tr=<tracker-url>&x.pe=<peer-address>

With this approach my current torrent client (Transmission) is stuck trying to get peers. In case of compatibility issues, I tried many other clients with no luck.

My second approach was to load the corresponding DHT node to the client temporarily and load the magnet url in the simplest form of:

magnet:?xt=urn:btih:<info-hash>

yielding no results.

In the following code, supposing we have infohash "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" that we received from listening DHT node "router.bittorrent.com:6881", shouldn't the following sample fetch the metadata?

import libtorrent as lt
import time

session = lt.session()
session.listen_on(6881, 6891)
session.add_dht_router("router.bittorrent.com", 6881)
session.start_dht()

time.sleep(1)

params = {"url":"magnet:?xt=urn:btih:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", "save_path":'.'}
h = session.add_torrent(params)

while (not h.has_metadata()):
    time.sleep(1)

# download...

Small edit to clarify on the first approach:

Having a peer I just discovered from an infohash in the DHT with ip:port of X:Y and another peer in the same logic with ip:port of Z:Y, shouldn't the following magnet, pasted in any torrent client supporting BEP 9, download the torrent?

magnet:?xt=urn:btih:<info-hash>&x.pe=X:Y&x.pe=Z:Y
  • router.bittorrent.com is a bootstrap node. Why do you think an node ID it delivers is a valid infohash for a torrent? – Anon Coward Nov 15 '22 at 01:43
  • Just an example. We assume "XXXXX..." is an infohash we got from "listening" on the router.bittorrent.com node as I stated. – Thanos Apostolidis Nov 15 '22 at 03:49
  • Edit: Maybe I should have used another placeholder node there to make it clear. This approach was questionable that's why I included it here - to get feedback on it since I'm not sure it works. My assumption is that if I listen on node ID for - let's say - get_peers messages, retrieve an infohash then then add ID as a dht router, python libtorrent bindings will do their magic. Is it wrong to assume? – Thanos Apostolidis Nov 15 '22 at 04:08
  • It's not always correct to say that a get_peers message will include a valid infohash. Sometimes they will, sometimes they will include an infohash near the infohash their interested in, but not exactly correct to add some anonymity, sometimes they will use a random infohash string for different reasons. There are no guarantees. If you have a valid infohash, then yes, you can instruct libtorrent to download it, but a random get_peers message's infohash may or may not qualify. – Anon Coward Nov 15 '22 at 04:30
  • This is exactly the answer I was looking for, thank you. I didn't have this information before. What would be the correct way to verify if an infohash is valid and a peer offers it? I tried initializing a BitTorrent handshake with each peer but so far I never got an echo back (I've read a similar thread you answered on said handshake so I think I'm doing it the correct way). Maybe the case is the hashes I found are indeed "incorrect" and I should try on a larger pool? Is there another tactic I should follow for indexing torrents from DHTs? – Thanos Apostolidis Nov 15 '22 at 18:28
  • The only way to verify an infohash is correct is to connect to a peer, and use BEP #9, download the metadata from a peer, then calculate the infohash for the metadata yourself and validate it matches with the target infohash. Without more details, I can't comment if you're starting the handshake correctly, but it will be a necessary step (as will accepting incoming connections from other peers since some percentage of peers are behind firewalls or otherwise misconfigured so that they can't accept incoming connections). – Anon Coward Nov 15 '22 at 23:58
  • I will give it a go. You can also write you second comment as an answer if you wish so I can mark it correct. Thanks again. – Thanos Apostolidis Nov 16 '22 at 00:54
  • I continued my quest [here](https://stackoverflow.com/questions/74806079/how-to-perform-a-bittorrent-handshake-given-an-infohash-and-its-peers). As always your opinion is highly valued. – Thanos Apostolidis Dec 15 '22 at 02:01

1 Answers1

0

If you need a general overview how a magnet link download works in principle then this answer should cover it.

If you want to debug your implementation you'll need to drill down into the details and make sure that all necessary steps are observable so you can check where stuff fails.

  • Does the DHT lookup return any peers?
  • Can you connect to the peers? Do the peers indicate support for the necessary extensions?
  • Does your your client make the requests to obtain the metdata?
  • Does it get replies?
  • Do the replies validate?
the8472
  • 40,999
  • 5
  • 70
  • 122
  • Hello, thanks for your answer. Yes, my implementation retrieves the peers from the DHT nodes based on an info-hash. When I try to handshake each peer, a connection is made but I get no response (each peer is supposed to echo the handshake). I'm looking into that currently. I understand how magnet links work but the part that confuses me is the one in BEP 9: "xt is the only mandatory parameter". Meaning that the tracker part can be omitted. When I specified the peers I retrieved from the DHT query on the magnet "x.pe" parameter I supposed it would yield results but it did not. – Thanos Apostolidis Nov 12 '22 at 00:53
  • The magnet link gives you the infohash. with that you can look up peers on the DHT. So you should debug the connection issue. Try wireshark. Or try a torrent with many peers. Try comparing with a client where magnet links work. – the8472 Nov 12 '22 at 08:52