4

What I'm trying to do


The MD5 function from the crypto library of OpenSSL (as well as many other its hash functions) returns an array of unsigned char. I'm trying to get a hash string from this array.

Example:

Array:

{126, 113, 177, 57, 8, 169, 240, 118, 60, 10, 229, 74, 249, 6, 32, 128}

Hash:

7e71b13908a9f0763c0ae54af9062080

Each number in the array is represented as two hexadecimal digits. And the length of a hash string is twice as great as the length of the array.

What I have got


Please see the full code here. Here is a part of it.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define nu 0

typedef struct {
   int len;
   unsigned char *Hash;
}Type;

Type test[6];


int main(void) {
            
    unsigned char XUzQ[16]={
    126, 113, 177, 57, 8, 169, 240, 118, 60, 10, 229, 74, 249, 6, 32, 128
    };
    test[0].len=16; test[0].Hash=XUzQ;

    int h; 
    const char *hex="0123456789abcdef";
    char *Ha=calloc(test[nu].len*2+1,sizeof(char));

    for (h=0;h<test[nu].len;++h) {
        *(++Ha)=hex[(test[nu].Hash[h]/16)%16];
        *(++Ha)=hex[test[nu].Hash[h]%16];
    }

    Ha-=(test[nu].len*2-1);

    if (strlen(Ha)==(test[nu].len*2))
    printf("'%s'\n",Ha);
    else puts("Failed!");

    Ha--;

    free(Ha);

    return 0;
}

This prints the value I expect (7e71b13908a9f0763c0ae54af9062080), but to my mind, the same thing could be implemented better and faster.

Notes


The names of the arrays I'm working with are so strange as they were auto-generated by my Python script using random characters.

test is intended to be an array that big (please see my full code by clicking the link above).

Question


How could I achieve the same result faster and easier? I'd be grateful if the solution supported all hashing algorithms that are supported by OpenSSL.

Community
  • 1
  • 1
ForceBru
  • 43,482
  • 10
  • 63
  • 98
  • possible duplicate of [hash function for string](http://stackoverflow.com/questions/7666509/hash-function-for-string) – Degustaf Apr 23 '15 at 16:35
  • Crypto hash functions and hash table hash functions are very different creatures. The first is designed with cracking in mind. The second is looking for the outputs to be evenly distributed, and **FAST!!** – Degustaf Apr 23 '15 at 16:36
  • @Degustaf, please read my question more carefully, I'm not developing hash tables and I'm pretty sure that the `crypto` library is the one I want to use. – ForceBru Apr 23 '15 at 16:39
  • 1) the 'Ha' is a pointer to char that was set via a call to calloc. That pointer should never be modified. suggest using a second pointer that starts as a copy of 'Ha'. 2) when done with an allocated memory segment, pass the pointer to that allocated memory to free(), otherwise a memory leak will result. 3) always check the returned value(!= NULL) from calloc(), and family of functions, to assure the operation was successful. – user3629249 Apr 23 '15 at 16:42
  • @user3629249, oh, yes, I forgot to `free` this pointer, thanks for noticing it. – ForceBru Apr 23 '15 at 16:43
  • regarding this line: 'Type test[6];' why an array, only the first instance of struct Type is being used? – user3629249 Apr 23 '15 at 16:49
  • @user3629249, see the _notes_ in the question. In my actual code I'm using six of these structs. – ForceBru Apr 23 '15 at 16:51
  • 1
    the strlen() function returns a 'size_t' which is unsigned. that is being compared with an 'int len' field from the Type struct. This causes the compiler to raise a warning, resulting in the compile is not 'clean'. suggest changing 'int len' to 'unsigned len' AND change 'int h;' to 'unsigned h;; which will result in a clean compile. – user3629249 Apr 23 '15 at 16:58
  • there is a lot of 'behind the scenes' math being performed in several places in the code. suggest using temporary pointers so the offsets into test[] do not have to be re-calculated over and over. Note: test[].len is not a needed field. could simply #define MAX_LEN (16) and use that name everywhere the code is currently using 'test[].len' or '16'. BTW: there is no way for the conversion to hex to fail with the current algorithm, so the last if/else can be reduced to a single printf(). the module 16 on the upper byte could lose info. if encrypted value always <256 to remove first modulo – user3629249 Apr 23 '15 at 17:03
  • I'm voting to close this question as off-topic because code shown works fine and OP only wants to know if he could have written it better. As such this question would be better asked on [codereview](http://codereview.stackexchange.com/) – Serge Ballesta Apr 23 '15 at 17:05
  • @user3629249, you've written so many helpful comments here that you may want to write an answer – ForceBru Apr 23 '15 at 17:26

1 Answers1

1

You could use snprintf, although it is likely that your existing solution is faster:

char* to_hex_string(const unsigned char h[16]) {
  char* out = malloc(33);
  size_t l =
    snprintf(out, 33,
      "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
      h[0], h[1], h[2], h[3], h[4], h[5], h[6], h[7],
      h[8], h[9], h[10], h[11], h[12], h[13], h[14], h[15]);
  assert(l == 32);
  return out;
}

Or, for a more general solution using sprintf:

void byte_to_02x(char out[3], unsigned char byte) {
  assert(snprintf(out, 3, "%02x", byte) == 2);
}
char* bytevector_to_hexstring(unsigned char* bytes, size_t n) {
  char* out = malloc(2*n + 1);
  for (int i = 0; i < n; ++i)
    assert(snprintf(&out[2*i], 3, "%02x", bytes[i]);
  return out;
}
rici
  • 234,347
  • 28
  • 237
  • 341
  • This was the first solution to pop in my head but it was rather slow and that was the reason I decided to implement my own tool to do it :) – ForceBru Apr 23 '15 at 16:47
  • @ForceBru: Yeah, it would be slower. Actually I can't think of a good reason to use it other than "always use the standard library when possible". – rici Apr 23 '15 at 16:52