2

I want to calculate Sha1 of any given file in C++ using OpenSSL library.

I have read any article on the internet (including all from stackoverflow too) about doing this for almost 3 days.

Finally I get my program to work but the generated hash of any given file is not as it should be.

My code is someway similar to these found here and here but more easy to read and to use further in my program I write.

Also, I want to use C++ code not C code as they are written in the links above, second, they use:

SHA256_Init(&context);
 SHA256_Update(&context, (unsigned char*)input, length);
 SHA256_Final(md, &context);

which aren't available anymore in the new/current OpenSSL version (3.0 or so, I think).

So, I think this question will help many other readers that I observe meet the same problem(s) I do with the new OpenSSL version and can not use old code samples anymore.

This is my C++ code that is created to read huge files by chuncks without loading them into memory (hope this will help future readers of this post because it have many useful lines but it is not fully working as you will see):

bool hashFullFile(const std::string& FilePath, std::string &hashed, std::string &hash_type) {
    bool success = false;
    EVP_MD_CTX *context = EVP_MD_CTX_new();
    //read file by chuncks:
    const int BUFFER_SIZE = 1024;
    std::vector<char> buffer (BUFFER_SIZE + 1, 0);

    // check if the file to read from exists and if so read the file in chunks
    std::ifstream fin(FilePath, std::ifstream::binary | std::ifstream::in);

    if (hash_type == "SHA1") {
        if (context != NULL) {
            if (EVP_DigestInit_ex(context, EVP_sha1(), NULL)) {
                while (fin.good()){


                    fin.read(buffer.data(), BUFFER_SIZE);
                    std::streamsize s = ((fin) ? BUFFER_SIZE : fin.gcount());
                    buffer[s] = 0;
                    //convert vector of chars to string:
                    std::string str(buffer.data());
                    if (!EVP_DigestUpdate(context, str.c_str(), str.length())) {
                        fprintf(stderr, "Error while digesting file.\n");
                        return false;
                    }


                }
                unsigned char hash[EVP_MAX_MD_SIZE];
                unsigned int lengthOfHash = 0;
                if (EVP_DigestFinal_ex(context, hash, &lengthOfHash)) {
                    std::stringstream ss;
                    for (unsigned int i = 0; i < lengthOfHash; ++i) {
                        ss << std::hex << std::setw(2) << std::setfill('0') << (int) hash[i];
                    }

                    hashed = ss.str();
                    success = true;
                }else{
                    fprintf(stderr, "Error while finalizing digest.\n");
                    return false;
                }
            }else{
                fprintf(stderr, "Error while initializing digest context.\n");
                return false;
            }
            EVP_MD_CTX_free(context);
        }else{
            fprintf(stderr, "Error while creating digest context.\n");
            return false;
        }
    }
    fin.close();
    return success;
}

And I am using it like this into main function:

std::string myhash;
std::string myhash_type = "SHA1";
hashFullFile(R"(C:\Users\UserName\data.bin)", myhash, myhash_type);
cout<<myhash<<endl;

The problem is that for a given file it calculates hash:

e.g. 169ed28c9796a8065f96c98d205f21ddac11b14e as the hash output but the same file has the hash:

 openssl dgst -sha1 data.bin
SHA1(data.bin)= 1927f720a858d0c3b53893695879ae2a7897eedb

generated by Openssl command line and also by any site from the internet.

I can't figure out what am I doing wrong since my code seems to be correct.

Please help.

Thank you very much in advance!

YoYoYo
  • 439
  • 2
  • 11
  • Why `((fin) ? BUFFER_SIZE : fin.gcount())`? Why not just use `gcount`? See also https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-i-e-while-stream-eof-cons. Converting your buffer to a string then passing the string's buffer to `EVP_DigestUpdate` seems unnecessarily complicated, why not just pass the buffer directly? – Alan Birtles Apr 02 '22 at 15:26
  • I adapted code from here http://www.cplusplus.com/forum/beginner/194071/ which read big file in C++ by chunks and that is how it was implemented. As the guy asking there for help there might be a problem and his code is not reading the full file (last chunk of it is not read). Also, I'm taking care of this possible problem that is why I am not using EOF in while loop to check: https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-i-e-while-stream-eof-cons . I don't know/understand what you mean about gcount, could you explain, please? Thank you! – YoYoYo Apr 02 '22 at 15:33
  • The simplest way to do this is with BIOs., protecting them with unique ptrs. Regardless, it is possible using the method you're trying, but the string conversion intermediate makes no sense. – WhozCraig Apr 02 '22 at 18:05
  • 1
    @WhozCraig Could you show me a full example, please? I don't know what BIOs are and neither to use them too. Also, I did the string conversion in order to use it as parameter needed for EVP_digest_update because, as you already know, there is no .c_str() method for vector of chars. I don't know, I think I am stuck here. Any help, please? Also, where my code is doing wrong since the final hash isn't the same as the one calculated by OpenSSL command line? Thank you so much! – YoYoYo Apr 02 '22 at 20:55
  • 1
    you want to separate out the digest generation from the output string. It is the former you care about. I'll post an answer showing how this is possible using BIOs and EVP_Digest both. – WhozCraig Apr 02 '22 at 21:04

1 Answers1

3

You're missing the finishing calculation on your EVP API attempt. The use of an intermediate string is unnecessary as well. Finally, the function should return the digest as a vector of bytes. let the caller do with that what they want.

Examples using both the EVP API and a BIO chain are shown below.

#include <iostream>
#include <fstream>
#include <algorithm>
#include <array>
#include <vector>
#include <memory>

#include <openssl/evp.h>
#include <openssl/sha.h>

namespace
{
    struct Delete
    {
        void operator()(BIO * p) const
        {
            BIO_free(p);
        }

        void operator()(EVP_MD_CTX *p) const
        {
            EVP_MD_CTX_free(p);
        }
    };

    using BIO_ptr = std::unique_ptr<BIO, Delete>;
    using EVP_MD_CTX_ptr = std::unique_ptr<EVP_MD_CTX, Delete>;
}

std::vector<uint8_t> hashFileEVP(const std::string &fname, std::string const &mdname = "sha1")
{
    // will hold the resulting digest
    std::vector<uint8_t> md;

    // set this to however big you want the chunk size to be
    static constexpr size_t BUFFER_SIZE = 1024;
    std::array<char, BUFFER_SIZE> buff;

    // get the digest algorithm by name
    const EVP_MD *mthd = EVP_get_digestbyname(mdname.c_str());
    if (mthd)
    {
        std::ifstream inp(fname, std::ios::in | std::ios::binary);
        if (inp.is_open())
        {
            EVP_MD_CTX_ptr ctx{EVP_MD_CTX_new()};
            EVP_DigestInit_ex(ctx.get(), mthd, nullptr);

            while (inp.read(buff.data(), BUFFER_SIZE).gcount() > 0)
                EVP_DigestUpdate(ctx.get(), buff.data(), inp.gcount());

            // size output vector
            unsigned int mdlen = EVP_MD_size(mthd);
            md.resize(mdlen);

            // general final digest
            EVP_DigestFinal_ex(ctx.get(), md.data(), &mdlen);
        }
    }
    return md;
}

std::vector<uint8_t> hashFileBIO(std::string const &fname, std::string const &mdname = "sha1")
{
    // the fixed-size read buffer
    static constexpr size_t BUFFER_SIZE = 1024;

    // will hold the resulting digest
    std::vector<uint8_t> md;

    // select this however you want.
    const EVP_MD *mthd = EVP_get_digestbyname(mdname.c_str());
    if (mthd)
    {
        // open the file and a message digest BIO
        BIO_ptr bio_f(BIO_new_file(fname.c_str(), "rb"));
        BIO_ptr bio_md(BIO_new(BIO_f_md()));
        BIO_set_md(bio_md.get(), mthd);

        // chain the bios together. note this bio is NOT
        //  held together with a smart pointer; all the
        //  bios in the chain are.
        BIO *bio = BIO_push(bio_md.get(), bio_f.get());

        // read through file one buffer at a time.
        std::array<char, BUFFER_SIZE> buff;
        while (BIO_read(bio, buff.data(), buff.size()) > 0)
            ; // intentionally empty

        // size output buffer
        unsigned int mdlen = EVP_MD_size(mthd);
        md.resize(mdlen);

        // read final digest from md bio.
        BIO_gets(bio_md.get(), (char *)md.data(), mdlen);
    }
    return md;
}

// convert a vector of byte to std::string
std::string bin2hex(std::vector<uint8_t> const& bin)
{
    std::string res;
    size_t len = 0;
    if (OPENSSL_buf2hexstr_ex(nullptr, 0, &len, bin.data(), bin.size(), 0) != 0)
    {
        res.resize(len);
        OPENSSL_buf2hexstr_ex(&res[0], len, &len, bin.data(), bin.size(), 0);
    }
    return res;
}

int main()
{    
    OpenSSL_add_all_digests();

    // i have this on my rig. use whatever you want
    //  or get the name from argv or some such.
    static const char fname[] = "dictionary.txt";

    auto md1 = hashFileEVP(fname);
    auto md1str = bin2hex(md1);
    std::cout << "hashed with EVP API\n";
    std::cout << md1str << '\n';

    auto md2 = hashFileBIO(fname);
    auto md2str = bin2hex(md1);
    std::cout << "hashed with BIO chain\n";
    std::cout << md2str << '\n';
}

Output

hashed with EVP API
0A97D663ADA2E039FD904846ABC5361291BD2D8E
hashed with BIO chain
0A97D663ADA2E039FD904846ABC5361291BD2D8E

Output from openssl command line

craig@rogue1 % openssl dgst -sha1 dictionary.txt
SHA1(dictionary.txt)= 0a97d663ada2e039fd904846abc5361291bd2d8e

Note the digests are the same in all three cases.

WhozCraig
  • 65,258
  • 11
  • 75
  • 141
  • I come back here because when I wanted to run it, it simply throws errors: OPENSSL_Uplink(000007FEE7056C88,07): no OPENSSL_Applink I tried to add #include and it throws errors because of the functions inside the applink.c file. After that I tried to copy it into my project folder and the errors still the same. All of these is just I read on the stackoverflow too. Another post from stackoverflows states that it is because stdout so you should not include any applink.c file. So, I deleted the include statement and commented out the stdout output – YoYoYo Apr 03 '22 at 13:32
  • and tried instead cout< – YoYoYo Apr 03 '22 at 13:33
  • This is my CMakeLists.txt file settings: https://pastebin.com/kj1vwH8T My OpenSSL is x64 bits version and the .exe I am trying to build is the same x64 bits. Also, I want to ask you, if I want to store the hash generated by these functions into an array, what data type it should be since the result from md1.data(); doesn't print any hash? Thank you so much in advance! – YoYoYo Apr 03 '22 at 13:38
  • Just come back again to say that: I compiled the applick file to applink.lib using these commands: g++ -o applink.obj -c applink.c and then ar rcs applink.lib applink.obj on windows 7 x64 bits. After that I added applink.lib into my project folder and also modified CMakeLists file from set(SOURCE_FILES main.cpp) to set(SOURCE_FILES main.cpp applink.lib) and also added to it target_link_libraries(${PROJECT_NAME} applink.lib) and the errors still the same. Also, I copied it into C:\Program Files (x86)\Embarcadero\Dev-Cpp\TDM-GCC-64\x86_64-w64-mingw32\lib folder and the erros still the same. – YoYoYo Apr 03 '22 at 14:33
  • There is no change. Could you help me with these, please? Thank you so much for everything and sorry for disturbing you again! – YoYoYo Apr 03 '22 at 14:34
  • 1
    Sending a vector of 8-bit octets through formatted output on a stream as if it were a character string is a waste of time. It won't work correctly, or to expectations. There's a reason I use BIO_dump in the example code I included, though I could have also used OPENSSL_buf2hexstr_ex and then sent the results to stdout. Regarding applink.c, the the easiest way to include it in your program is to `#include` directly in **one** source file (for a windows app, stdafx.cpp is a pretty handy place to stuff it). This is one way OpenSSL.org reccommends it's inclusion on Windows. – WhozCraig Apr 03 '22 at 18:22
  • "Sending a vector of 8-bit octets through formatted output on a stream as if it were a character string is a waste of time. It won't work correctly, or to expectations. There's a reason I use BIO_dump in the example code I included, though I could have also used OPENSSL_buf2hexstr_ex and then sent the results to stdout. " ---> How to store properly the sha1 hash string value into a vector? Thank you in advance and sorry for bump again but I am struggling with this. – YoYoYo Apr 05 '22 at 12:01
  • 1
    Added `bin2hex` conversion to string. – WhozCraig Apr 05 '22 at 14:28