1

I have a struct defined in C as follows

typedef struct {
    unsigned int size;
    unsigned int pool_size;
    route _routes[SIZE_OF_FLEET];
    request _request_pool[SIZE_OF_PROBLEM];
    /* stores which request is contained in which route */
    unsigned int _request_map[SIZE_OF_PROBLEM];
} solution;

I am trying to define a hash function for this struct as follows

unsigned long long solution_hash(solution const *_sol)
{
    unsigned long long hash = 0;
    unsigned short c;
    unsigned short *reinterpret_sol;
    reinterpret_sol = (unsigned short*)&_sol;
    size_t size_ = sizeof(solution);
    size_t elem_size_ = sizeof(unsigned short);
    int len = (int)size_/elem_size_;

    for (int i = 0; i < len; i++) {
        c = reinterpret_sol[i];
        hash += c;
    }
    return hash;
}

The problem is everytime I make a call to solution_hash function, the hash value of the same solution changes. Successive calls increment the value by 32 for the same solution.

What is wrong with this code? Is there a better way to implement a hash function for a struct?

Anirban Mandal
  • 161
  • 1
  • 8
  • 3
    `reinterpret_sol = (unsigned short*)&_sol;` - Are you sure about `&`? In any case looks like breaking the *strict aliasing* rule.. – Eugene Sh. Jul 20 '16 at 16:05
  • Oh, thank you very much. I tried for hours what was wrong with the code, and you figured it out in less than a min :) – Anirban Mandal Jul 20 '16 at 16:08
  • " Is there a better way to implement a hash function for a struct?" Yes, but it depends on hashing goals: which are not yet stated. – chux - Reinstate Monica Jul 20 '16 at 16:08
  • 1
    Even if you fix the `&` and the strict aliasing violation, this looks like you could easily end up hashing the padding inside a struct. Also, simple addition is a pretty poor way to mix the `c` values. – user2357112 Jul 20 '16 at 16:10
  • @chux I wish to store in a simple manner that I have previously obtained, so I need more all less a bijection from the set of integers to the set of solutions. – Anirban Mandal Jul 20 '16 at 16:13
  • @user2357112 Yes you're correct. I'll use a better hash function. – Anirban Mandal Jul 20 '16 at 16:13
  • @user2357112 hashing the padding bytes are not necessarily a problem, if consistent. Though I can't find a reference regarding the *values* of the padding bytes.. – Eugene Sh. Jul 20 '16 at 16:14
  • Isn't _bijection_ attempting a one-to-one relationship, whereas hashing is a many-to-one? – chux - Reinstate Monica Jul 20 '16 at 16:15
  • So perhaps you want a *compression*, not *hashing*? – Eugene Sh. Jul 20 '16 at 16:17
  • @chux Yes, that is correct. A few collisions don't matter to me. I wish that the chances of two different solutions having the same hash value are as low as possible. – Anirban Mandal Jul 20 '16 at 16:17
  • @EugeneSh.Yes, that is correct. – Anirban Mandal Jul 20 '16 at 16:19
  • 2
    @EugeneSh.: [C11 final draft, section 6.2.6.1, part 6](http://port70.net/~nsz/c/c11/n1570.html#6.2.6.1p6): "When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values." The padding bytes aren't consistent. – user2357112 Jul 20 '16 at 16:19
  • @user2357112 Yep. That's what I was looking for. Thanks, it is useful. – Eugene Sh. Jul 20 '16 at 16:19
  • 1
    For reference (to help with decision): http://stackoverflow.com/questions/996843/when-is-crc-more-appropriate-to-use-than-md5-sha1 – Eugene Sh. Jul 20 '16 at 16:23
  • @Everyone Thank you very much for your help. – Anirban Mandal Jul 20 '16 at 16:26

1 Answers1

0

Casting hid the problem as pointed out by @Eugene Sh.

// reinterpret_sol = (unsigned short*)&_sol;
reinterpret_sol = (unsigned short*) _sol;
Community
  • 1
  • 1
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256