3

Code below (apologies for ugliness), I'm running this to calculate the hash for a torrent but it is giving me a different answer than when I open that torrent directly in Transmission:

I'm testing on r_000 on this page: http://gen.lib.rus.ec/repository_torrent/

Transmission gives me: 63a04291a8b266d968aa7ab8a276543fa63a9e84

My code gives me: 1882ff6534ee4aa660e2fbf225c1796638bea4c0

import bencoding
from io import BytesIO
import binascii
import hashlib

with open("cache/r_000.torrent", "rb") as f:
    data = bencoding.bdecode(f.read())
info = data[b'info']
hashed_info = hashlib.sha1(info[b'pieces']).hexdigest()
print(hashed_info)

Any idea what I've screwed up? Thanks!

M. Who
  • 31
  • 1
  • 3
  • This has already been asked. Have a look at [this answer](https://stackoverflow.com/a/28162042/3151902). – user3151902 Sep 03 '17 at 17:08
  • Seems like you are hashing the `pieces`-value instead of the `info`-dictionary – Encombe Sep 03 '17 at 17:08
  • Yep , needed to take a step back and look again. Okay so the solution is to bencode the whole info dictionary and then hash that. – M. Who Sep 03 '17 at 17:35
  • Bdecode and then Bencode may in some rare chases give the wrong info_hash. See this answer: https://stackoverflow.com/questions/19749085/calculating-the-info-hash-of-a-torrent-file/19800109#19800109 – Encombe Sep 03 '17 at 18:05
  • Look at this [answer](https://stackoverflow.com/questions/28140766/hash-calculation-in-torrent-clients/28162042#28162042) first. However, if you want to hash other files, I found this guide on Google: [Hashing files with Python | Python Central](http://pythoncentral.io/hashing-files-with-python) – CFV Sep 03 '17 at 17:18

1 Answers1

5

I made the same mistake. Searching found this question and that helped me fix it. But to make it clearer for others who come this way via searches on how to do it from python3+ this is the explicit fix:

Change:

hashed_info = hashlib.sha1(info[b'pieces']).hexdigest()

to:

hashed_info = hashlib.sha1(bencoding.bencode(info)).hexdigest()

Thanks to Encombe for clarifying the info hash here: https://stackoverflow.com/questions/28140766/28162042#28162042

The hash in a torrent client or the hash you find in a magnet-URI is the SHA1-hash of the raw bencoded info-dictionary-part of a torrent-file.


A full but minimalistic example is:

import bencoding, hashlib

objTorrentFile = open("r_0000.torrent", "rb")
decodedDict = bencoding.bdecode(objTorrentFile.read())

info_hash = hashlib.sha1(bencoding.bencode(decodedDict[b"info"])).hexdigest()
print(info_hash)

Result:

$ python3 example.py
63a04291a8b266d968aa7ab8a276543fa63a9e84
CGar
  • 73
  • 5
  • 2
    Good solution, but have in mind that in some rare cases, Bdecoding and then Bencoding before hashing, may give the [wrong info_hash](https://stackoverflow.com/questions/19749085/calculating-the-info-hash-of-a-torrent-file/19800109#19800109). – Encombe Sep 18 '17 at 04:47
  • 1
    Thanks for the extra info. How would I prevent such an occurrence? Is it just when its in the wrong order and the decoder library sorts it that it will mismatch or is there other cases? The library I'm using is actually the [bencoder](https://github.com/utdemir/bencoder) library and I can see a part that sorts. – CGar Sep 18 '17 at 22:53