31

I am trying to write a C program that proves SHA1 is nearly collision free, but I cannot figure out how to actually create the hash for my input values. I just need to create the hash, and store the hex value into an array. After some Google searches, I've found OpenSSL documentation directing me to use this:

 #include <openssl/sha.h>

 unsigned char *SHA1(const unsigned char *d, unsigned long n,
                  unsigned char *md);

 int SHA1_Init(SHA_CTX *c);
 int SHA1_Update(SHA_CTX *c, const void *data,
                  unsigned long len);
 int SHA1_Final(unsigned char *md, SHA_CTX *c);

I believe I should be using either unsigned char *SHA1 or SHA1_Init, but I am not sure what the arguments would be, given x is my input to be hashed. Would someone please clear this up for me? Thanks.

Niall C.
  • 10,878
  • 7
  • 69
  • 61
spassen
  • 1,550
  • 8
  • 20
  • 32
  • what are your input values: in-memory strings, or file contents? – Basile Starynkevitch Feb 14 '12 at 21:31
  • I am writing a birthday attack that should create a new hash and adds it to the end every time I clear through the array. I was just going to keep it simple and hash the value of i. Quick answer, in memory strings. – spassen Feb 14 '12 at 21:34
  • 1
    What do you mean with 'proving that SHA1 is nearly collision-free'? SHA1 is a 160-bit hash, so there are 2^160 possible values, but there are far more than 2^160 possible strings (say shorter than 1MB), so there are tons of collisions. If you just want to test whether you get collisions from a number of randomly generated strings, the number of strings needed for a halfway reliable answer is unfeasibly high (unless you happen to find a collision early, but SHA1 is tested well enough to assign that a negligibly small probability). – Daniel Fischer Feb 14 '12 at 21:41
  • I realize there are plenty of possible collisions, but the goal is to prove it would take a significant amount of time to find a collision (about 2^80) and take even more time to find a collision that matches a specific hash. – spassen Feb 14 '12 at 21:44
  • 2
    But realistically you cannot test more than 2^34 strings or so. Even if SHA1 were skewed in a way that you'd only need 2^50 strings for a collision, you almost certainly won't see it. – Daniel Fischer Feb 14 '12 at 22:10
  • That is exactly what I am trying to prove. – spassen Feb 14 '12 at 22:12
  • deserves a quote here: https://github.com/thal/sha1 – ton Jun 20 '20 at 14:09

7 Answers7

56

If you have all of your data at once, just use the SHA1 function:

// The data to be hashed
char data[] = "Hello, world!";
size_t length = strlen(data);

unsigned char hash[SHA_DIGEST_LENGTH];
SHA1(data, length, hash);
// hash now contains the 20-byte SHA-1 hash

If, on the other hand, you only get your data one piece at a time and you want to compute the hash as you receive that data, then use the other functions:

// Error checking omitted for expository purposes

// Object to hold the current state of the hash
SHA_CTX ctx;
SHA1_Init(&ctx);

// Hash each piece of data as it comes in:
SHA1_Update(&ctx, "Hello, ", 7);
...
SHA1_Update(&ctx, "world!", 6);
// etc.
...
// When you're done with the data, finalize it:
unsigned char hash[SHA_DIGEST_LENGTH];
SHA1_Final(hash, &ctx);
Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
  • I tried using the sha1 function, but when I compile in the terminal it says Undefined reference to SHA1. I don't get any complaints about anything else. Any idea what I'm missing? – spassen Feb 14 '12 at 22:02
  • 13
    You need to link with the OpenSSL runtime library. Assuming you're using gcc, add `-lcrypto` to your linker command line. – Adam Rosenfield Feb 14 '12 at 22:33
  • how would one generate hmacsha1? – Cmag May 15 '13 at 14:31
  • No error handling needed? For me SHA1_Final crashes but have no clue why. Is there any way to print the error? – BTR Naidu May 06 '16 at 11:22
  • How to get the `hash` value? I'm getting �*���qE)DDF:� �4� with `printf("<>\n", hash);` – Kostiantyn Nov 11 '17 at 06:57
  • In the first example, `sizeof(data)` is 14. In the second example, the combined size is 13. The hashes will be different. – Nisse Engström Mar 04 '18 at 02:06
12

They're two different ways to achieve the same thing.

Specifically, you either use SHA_Init, then SHA_Update as many times as necessary to pass your data through and then SHA_Final to get the digest, or you SHA1.

The reason for two modes is that when hashing large files it is common to read the file in chunks, as the alternative would use a lot of memory. Hence, keeping track of the SHA_CTX - the SHA context - as you go allows you to get around this. The algorithm internally also fits this model - that is, data is passed in block at a time.

The SHA method should be fairly straightforward. The other works like this:

unsigned char md[SHA_DIGEST_LENGTH];
SHA_CTX context;
int SHA1_Init(&context);

for ( i = 0; i < numblocks; i++ )
{
    int SHA1_Update(&context, pointer_to_data, data_length);
}
int SHA1_Final(md, &context);

Crucially, at the end md will contain the binary digest, not a hexadecimal representation - it's not a string and shouldn't be used as one.

Ian
  • 11,280
  • 3
  • 36
  • 58
  • how would one generate hmacsha1? – Cmag May 15 '13 at 14:47
  • 2
    @Clustermagnet hmacsha1 is a HMAC algorithm, using SHA1 as the hash. It's the same idea as in my answer here(see [here](http://www.openssl.org/docs/crypto/hmac.html)) but for the `EVP_MD` argument specific to HMAC you specify `EVP_sha1()`. –  May 15 '13 at 15:12
  • 1
    @Cmag - see [EVP Signing and Verifying | HMAC](http://wiki.openssl.org/index.php/EVP_Signing_and_Verifying#HMAC) on the OpenSSL wiki. Also see [Using HMAC vs EVP functions in OpenSSL](http://stackoverflow.com/a/20322002/608639) on Stack Overflow. – jww Jun 13 '16 at 00:11
3

I believe I should be using either unsigned char *SHA1 or SHA1_Init ...

For later versions of the OpenSSL library, like 1.0.2 and 1.1.0, the project recommends using the EVP interface. An example of using EVP Message Digests with SHA256 is available on the OpenSSL wiki:

#define handleErrors abort

EVP_MD_CTX *ctx;

if((ctx = EVP_MD_CTX_create()) == NULL)
    handleErrors();

if(1 != EVP_DigestInit_ex(ctx, EVP_sha256(), NULL))
    handleErrors();

unsigned char message[] = "abcd .... wxyz";
unsinged int message_len = sizeof(message);

if(1 != EVP_DigestUpdate(ctx, message, message_len))
    handleErrors();

unsigned char digest[EVP_MAX_MD_SIZE];
unsigned int digest_len = sizeof(digest);

if(1 != EVP_DigestFinal_ex(ctx, digest, &digest_len))
    handleErrors();

EVP_MD_CTX_destroy(ctx);
jww
  • 97,681
  • 90
  • 411
  • 885
3

Adam Rosenfield's answer is fine, but use strlen rather than sizeof, otherwise hash will be calculated including null terminator. Which is probably fine in this case, but not if you need to compare your hash with one generated by other tool.

// The data to be hashed
char data[] = "Hello, world!";
size_t length = strlen(data);

unsigned char hash[SHA_DIGEST_LENGTH];
SHA1(data, length, hash);
// hash now contains the 20-byte SHA-1 hash
Shadowchaser
  • 586
  • 4
  • 8
3

The first function (SHA1()) is the higher-level one, it's probably the one you want. The doc is pretty clear on the usage - d is input, n is its size and md is where the result is placed (you alloc it).

As for the other 3 functions - these are lower level and I'm pretty sure they are internally used by the first one. They are better suited for larger inputs that need to be processed in a block-by-block manner.

kralyk
  • 4,249
  • 1
  • 32
  • 34
2

Calculate hash like this

// Object to hold the current state of the hash
SHA_CTX ctx;
SHA1_Init(&ctx);

// Hash each piece of data as it comes in:
SHA1_Update(&ctx, "Hello, ", 7);
...
SHA1_Update(&ctx, "world!", 6);
// etc.
...
// When you're done with the data, finalize it:
unsigned char tmphash[SHA_DIGEST_LENGTH];
SHA1_Final(tmphash, &ctx);

Finally you can decode hash to human-readable form by code like this.

unsigned char hash[SHA_DIGEST_LENGTH*2];

int i = 0;
for (i=0; i < SHA_DIGEST_LENGTH; i++) {
    sprintf((char*)&(hash[i*2]), "%02x", tmphash[i]);
}
// And print to stdout
printf("Hash: %s\n", hash);
linux_art
  • 51
  • 3
0

Let the code speak

SQLite dev tree contains a source for the tool DBHASH for applying SHA1 to whole databases. Complete with SHA1 implementation.

You might find it feasible to study that code.

ps: There is also SHA1 implemented as SQLite user-defined function.

Chef Gladiator
  • 902
  • 11
  • 23