5

I have implemented a python module that scrapes two torrent sites using python scrapy. It currently stores torrent data in a DB, and it has ability to download the torrent file by calling a bash script that downloads the .torrent files.

For research purposes in DB visualization I would like to know if it is possible, and if so, how to obtain IP addresses of seeders and peers of a specific torrent file. I would like to create a visualization to see correlations between torrent files , their types (movies, iso images, books, ..etc) to the locations they are seeded / peers are downloading from.

For instance here is what is inside one of the .torrent files I scraped:(just part of it)

d8:announce38:udp://tracker.publicbt.com:80/announce13:announce-listll38:udp://tracker.publicbt.com:80/announceel44:udp://tracker.openbittorrent.com:80/announceel35:udp://tracker.istole.it:80/announceel36:udp://open.demonii.com:1337/announceee7:comment61:Torrent downloaded from torrent cache at http://torcache.net/10:created by15:BitTorrent/782013:creation datei1384198882e8:encoding5:UTF-84:infod5:filesld6:lengthi25485e4:pathl69:Physics of Quantum Mechanics, The - Skinner, David, Binney, James.jpgeed6:lengthi1254e4:pathl69:Physics of Quantum Mechanics, The - Skinner, David, Binney, James.opfeed6:lengthi4609366e4:pathl69:Physics of Quantum Mechanics, The - Skinner, David, Binney, James.pdfeee4:name52:The Physics of Quantum Mechanics- Oxford, 2013 [PDF]12:piece lengthi16384e6:pieces5660:³é^G^W^H<83>æZèÖunB2ä<82>ªb­<96>".ËWvÓo^?.F´<8e>ÍZQQÕ¬8Þ+þXS<91>-S^O9<91>¸<9f>Ê'<97>3ÎpÕöC^CNÞÔ»^F3HJ,=Àòà¶,<81><ö<84><8a>ÃÀdÔ,^SýZ<8f>!Q"r¹<98>³Agì=ûr"ged<96>½<89>à ¥E'Å^V|ïª{^M<88><9c>»z½/qsø<^8^@í¤Ô[_<83><9e><97>Éãs^V×»Ö\Ûë"^NÝó<9e>¬^Kbì«õ<98>²<82>^\_PÍFª^_µ^L<9b>^Vâ^NhÛ<87>-@ê\íäÎ/³<8c>^]jÀóp<87>¬ <87><8e>,?<8d>&^^®Rê±ÃFÏÂ&Ü]!ö<87><zü{SîÖg.I±Ã^QÃ~Ê>uÛÜä^Cw^_d_r0<8a>h<81><9b>êªE­Ça^N¢M4Èv^_<96>lË,g­^Fò«^]¿<9c><88>p^[Ñ.ìk©t

Will I be able use this info in the file to connect to the tracker and move forward finding peers and seeders for that file?

Charles
  • 50,943
  • 13
  • 104
  • 142
Saher Ahwal
  • 9,015
  • 32
  • 84
  • 152

1 Answers1

3

Essentially yes, since that what the BitTorrent clients do. Take a look at the Bittorrent Specification. The details of the file format are there, along with the protocol specification. That should tell you everything you need to know.

  • I guess the question now would be, Do all .torrent files obey the same spec? Do i have to have different parsers for different .torrent files downloaded from different sites? – Saher Ahwal Nov 11 '13 at 23:48
  • 1
    Given that BitTorrent is not an 'official' specification, anything's possible. If you find something that doesn't fit this spec you'll have to research that yourself. –  Nov 12 '13 at 00:05
  • I found the https://pypi.python.org/pypi/BitTorrent-bencode library and used it to parse the torrent file and get announce, announce-list and info hash. Should I treat the urls specially since they have `udp://` they use UDP protocol – Saher Ahwal Nov 13 '13 at 01:36
  • 1
    I would like to point you here: http://stackoverflow.com/questions/19962670/get-ip-addresses-from-udp-and-http-torrent-tracker-response I have implemented udp and http scrape, yet no way to get IPs. – Saher Ahwal Nov 14 '13 at 18:44
  • you can just run the .torrent file, then you will see the IP of who is seeding it to you.. and the ip of who you are seeding it to.. The ip is not going to be in the .torrent file. Usually that file will connect you to a site, the site holds all info about the users that connects and then passes the ip from there. – Glen Morse Nov 27 '13 at 05:06
  • 1
    So is there a way to get the total number of seeds and peers?? – Muaaz Khalid Aug 01 '17 at 15:32