I'm interested in the Btdigg.org which is called a "DHT search engine"
. According to this article, it doesn't store any content and even has no database. Then how does it work? Doesn't it need to gather meta infos and store them in database like other normal search engines? After a user submit a query, it scans the DHT network and return the results in "real time"? Is this possible?

- 677
- 8
- 21

- 171
- 1
- 1
- 3
4 Answers
I don't have specific insight into BTDigg, but I believe the claim that there is not database (or something that acts like a database) is a false statement. The author of that article might have been referring to something more specific that you might encounter in a traditional torrent site, where actual .torrent files are stored for instance.
This is how a BTDigg-like site works:
- You run a bunch of DHT nodes, specifically with the purpose of "eaves dropping" on DHT traffic, to be introduced to info-hashes that people talk about.
- join those swarms and download the metadata (.torrent file) by using the ut_metadata extension
- index the information you find in there, map it to the info-hash
- Provide a front-end for that index
If you want to luxury it up a bit you can also periodically scrape the info-hashes you know about to gather stats over time and maybe also figure out when swarms die out and should be removed from the index.
So, the claim that you don't store .torrent files nor any content is true.
It is not realistic to search the DHT in real-time, because the DHT is not organized around keyword searches, you need to build and maintain the index continuously, "in the background".
EDIT:
Since this answer, an optimization (BEP 51) has been implemented in some DHT clients that lets you query which info-hashes they are hosting, significantly reducing the cost of indexing.

- 10,915
- 1
- 32
- 40
-
@Encombe care to elaborate that? (github may be a better venue though: https://github.com/bittorrent/bittorrent.org/issues ) – Arvid Aug 02 '19 at 14:53
For a deep understanding of DHT and its applications, see Scott Wolchok's paper and presentation "Crawling BitTorrent DHTs for Fun and Profit". He presents the autonomous search engine idea as a sidenote to his study of DHT's security:
PDF of his paper:
His presentation at DEFCON 18 (parts 1 & 2)

- 948
- 6
- 14
https://www.usenix.org/legacy/event/woot10/tech/full_papers/Wolchok.pdf
The method used in Section 3 seems to suggest a database to store all the torrent data is required. While performance is better, it may not be a true DHT search engine.
Section 8, while less efficient, seems to be a DHT search engine as long as the keywords are the store values.
From Section 3, Bootstrapping Bittorent Search:
"The system handles user queries by treating the concatenation of each torrent's filenames and description as a document in the typical information retrieval model and using an inverted index to match keywords to torrents. This has the advantage of being well supported by popular open-source relational DBMSs. We rank the search results according to the popularity of the torrent, which we can infer from the number of peers listed in the DHT"
From Section 8, Related Work:
the usual approach to distributing search using a DHT is with an inverted index, by storing each (keyword, list of matching documents) pair as a key-value pair in the DHT. Joung et al. [17] describe this approach and point out its performance problems: the Zipf distribution of keywords among files results in very skewed load balance, document information is replicated once for each keyword in the document, and it is difficult to rank documents in a distributed environment

- 2,595
- 1
- 26
- 43
It is divided into two steps.
To achieve bep_0005 protocol got infohash, you do not need to implement all protocol requires only now
find_node (request)
,get_peers (response)
,announce_peer (response)
. Here's one of my open source dhtspider.To achieve bep_0009 protocol got metainfo index it, here are my own a bittorrent search engine, every day can get unique infohash 300w +, effective metainfo 50w +.

- 8,356
- 19
- 50
- 61

- 41
- 2