-3

I am trying to write a custom hash function for use in an unordered_map. I am able to insert items and iterate through them, but cannot lookup items using the at(), find() or operator[] functions.

#include <cstddef>
#include <iostream>
#include <functional>
#include <random>
#include <unordered_map>

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#define JHASH_GOLDEN_RATIO  0x9e3779b9

using namespace std;

#define __jhash_mix(a, b, c) \
{ \
  a -= b; a -= c; a ^= (c>>13); \
  b -= c; b -= a; b ^= (a<<8); \
  c -= a; c -= b; c ^= (b>>13); \
  a -= b; a -= c; a ^= (c>>12);  \
  b -= c; b -= a; b ^= (a<<16); \
  c -= a; c -= b; c ^= (b>>5); \
  a -= b; a -= c; a ^= (c>>3);  \
  b -= c; b -= a; b ^= (a<<10); \
  c -= a; c -= b; c ^= (b>>15); \
}

uint32_t
jhash_3words (uint32_t a, uint32_t b, uint32_t c, uint32_t initval)
{
  a += JHASH_GOLDEN_RATIO;
  b += JHASH_GOLDEN_RATIO;
  c += initval;

  __jhash_mix (a, b, c);

  return c;
}

size_t jenkins_hash(const uint32_t &num){
    return (size_t) jhash_3words (num, 0, 0, rand());
}

int main(){

    std::unordered_map<uint32_t, char, std::function<decltype(jenkins_hash)>> ids(0, jenkins_hash );

        ids[(uint32_t) 42341] = 'g';
        ids[(uint32_t) 2232] = 'b';
        ids[(uint32_t) 7556] = 'q';

    for ( auto it = ids.begin(); it != ids.end(); ++it ){
        std::cout << "Element " << it->first << " " << it->second << std::endl;
    }
    std::cout << ids.size() << std::endl;

    printf("%c\n", ids.find(7556)->second);


    return 0;
}

Here is the output of the above code:

Element 2232 b
Element 7556 q
Element 42341 g
3
Segmentation fault (core dumped)

My questions are...

  • Why is this producing a segfault?

  • Why do at() and operator[] simply return nothing, whereas find() segfaults?

Any ideas?

BlackBeard
  • 10,246
  • 7
  • 52
  • 62
Zoroshino
  • 81
  • 1
  • 5
  • 2
    Totally unrelated to your problem, but please avoid symbols with two leading underscores (like `__jhash_mix`) as those are reserved for "the implementation" (compiler and standard library). See [this question and its answers](http://stackoverflow.com/questions/228783/what-are-the-rules-about-using-an-underscore-in-a-c-identifier) for information. – Some programmer dude Aug 18 '17 at 06:25
  • id `find` in `ids.find(7556)->second` returns no iterator and same time you trying to access its `second`, this will trigger seg fault – Nipun Aug 18 '17 at 06:26
  • 2
    Your hash is not deterministic, please try to understand what hashing means and what `unordered_map` is doing before trying to plug in a custom hash – Passer By Aug 18 '17 at 06:28
  • On another unrelated note, please avoid C-style casting in C++ as well. If you need to do it then you most likely are doing something wrong. – Some programmer dude Aug 18 '17 at 06:29
  • As for your problem, read the comment by @PasserBy, and try to understand what's meant by "not deterministic". As a hint: What do you think the `rand` function does? How could two identical keys generate the same hash if you have a random component mixed into the calculation? – Some programmer dude Aug 18 '17 at 06:31

1 Answers1

2
size_t jenkins_hash(const uint32_t &num){
    return (size_t) jhash_3words (num, 0, 0, rand());
}

Since your hash function includes a random element, the two times 7556 is hashed will yield different hashed values. This means that the lookup done by find fails to locate anything and actually returns ids.end(), which you should not dereference.

Here's how you should be using find():

auto found = ids.find(7556);
if(found != ids.end()) {
  printf("%c\n", found->second);
}