6

I'm using C++ to parse the info hash of a torrent file, and I am having trouble getting a "correct" hash value in comparison to this site:

http://i-tools.org/torrent

I have constructed a very simple toy example just to make sure I have the basics right.

I opened a .torrent file in sublime and stripped off everything except for the info dictionary, so I have a file that looks like this:

d6:lengthi729067520e4:name31:ubuntu-12.04.1-desktop-i386.iso12:piece lengthi524288e6:pieces27820:¡´E¶ˆØËš3í   ..............(more unreadable stuff.....)..........

I read this file in and parse it with this code:

#include <string>
#include <sstream>
#include <iomanip>
#include <fstream>
#include <iostream>

#include <openssl/sha.h>


void printHexRep(const unsigned char * test_sha) {

    std::cout << "CALLED HEX REP...PREPPING TO PRINT!\n";
    std::ostringstream os;
    os.fill('0');
    os << std::hex;
    for (const unsigned char * ptr = test_sha; ptr < test_sha + 20; ptr++) {

        os << std::setw(2) << (unsigned int) *ptr;
    }
    std::cout << os.str() << std::endl << std::endl;
}


int main() {

    using namespace std;

    ifstream myFile ("INFO_HASH__ubuntu-12.04.1-desktop-i386.torrent", ifstream::binary);

    //Get file length
    myFile.seekg(0, myFile.end);
    int fileLength = myFile.tellg();
    myFile.seekg(0, myFile.beg);

    char buffer[fileLength];

    myFile.read(buffer, fileLength);
    cout << "File length == " << fileLength << endl;
    cout << buffer << endl << endl;

    unsigned char datSha[20];
    SHA1((unsigned char *) buffer, fileLength, datSha);
    printHexRep(datSha);

    myFile.close();

    return 0;
}

Compile it like so:

g++ -o hashes info_hasher.cpp -lssl -lcrypto

And I am met with this output:

4d0ca7e1599fbb658d886bddf3436e6543f58a8b

When I am expecting this output:

14FFE5DD23188FD5CB53A1D47F1289DB70ABF31E

Does anybody know what I might be doing wrong here? Could the problem lie with the un-readability of the end of the file? Do I need to parse this as hex first or something?

Ethan
  • 1,206
  • 3
  • 21
  • 39

2 Answers2

11

Make sure you don't have a newline at the end of the file, you may also want to make sure it ends with an 'e'.

The info-hash of a torrent file is the SHA-1 hash of the info-section (in bencoded form) from the .torrent file. Essentially you need to decode the file (it's bencoded) and remember the byte offsets where the content of the value associated with the "info" key begins and end. That's the range of bytes you need to hash.

For example, if this is the torrent file:

d4:infod6:pieces20:....................4:name4:test12:piece lengthi1024ee8:announce27:http://tracker.com/announcee

You wan to just hash this section:

d6:pieces20:....................4:name4:test12:piece lengthi1024ee

For more information on bencoding, see BEP3.

Arvid
  • 10,915
  • 1
  • 32
  • 40
  • I'm pretty sure the issue was with newline characters! Thanks! The example above is a toy example, I deleted everything in the torrent file except for the info-dictionary. – Ethan Nov 06 '13 at 01:42
  • 3
    Be observant that the example torrent file given by Arvid, both the root-dictionary and the info-dictionary is unsorted. According to the bencode specification a dictionary must be sorted. However the agreed convention when a info-dictionary for some reason is unsorted, is to hash the info-dictionary raw as it is (unsorted), as explained by Arvid above. – Encombe Jan 30 '15 at 08:29
  • 3
    yes, good point. It may also serve as an illustration that even when the bencoded dictionary is incorrectly sorted, the info-hash is still the hash of the verbatim form. Some clients (used to) decode and recode before hashing, which would result in incorrect info-hashes in such cases. – Arvid Jan 30 '15 at 17:49
  • Is it the way to implement this? Please correct me if I am wrong. (1) x = position of `4:info` within the string (2) y = position of `8:announce` within the string (3) if x > y or y is false (`8:announce` is not found) then y = position of the last `e` within the string (4) infohash = sha-1 hash of the substring starting at x+6 and ending at y-1. – Sharanya Dutta Jun 05 '16 at 08:55
  • bencoded messages form a tree. you can nest a dictionary as a value in another dictionary. simply looking for strings will not be enough, you have to match up the start of the info dictionary (the 'd' immediately following "4:info") with its terminating 'e'. there will likely be other d's and e's in between – Arvid Jun 06 '16 at 13:55
1

SHA1 calculation is just as simple as what you've written, more or less. The error is probably in the data you're feeding it, if you get the wrong answer from the library function.

I can't speak to the torrent file prep work you've done, but I do see a few problems. If you'll revisit the SHA1 docs, notice the SHA1 function never requires its own digest length as a parameter. Next, you'll want to be quite certain the technique you're using to read the file's contents is faithfully sucking up the exact bytes, no translation.

A less critical style suggestion: make use of the third parameter to SHA1. General rule, static storage in the library is best avoided. Always prefer to supply your own buffer. Also, where you have a hard-coded 20 in your print function, that's a marvelous place for that digest length constant you've been flirting with.

Salt
  • 156
  • 2
  • whooopss totally misread and thought the length passed to SHA1 was for the receiving array! I fixed that, and obviously I get a different hash, but it still isn't the correct one.... I think the problem is with the torrent file stuff I am or am not doing. The torrent file I trimmed the front off of is the same one I feed into that webpage to get the different result, so it fundamentally is not corrupted or anything. I changed the posted code to reflect the changes. – Ethan Nov 03 '13 at 07:56