-2

I am looking to create a simple c++ hash function which will return a number within a max range based on a string input. So that, the same string will always return the same integer value. Here is an arbitrary example where the max desired range is 36.

Fred Smith -> 25
tree -> 34
Frog -> 0
Fred Smith -> 25
fred smith -> 7

These numbers are arbitrary, but the function should use an algorithm that does a numeric calculation against the string and results in an integer within a defined range. I will eventually rewrite this function for use in Python 2.7 as well.

I am using vs2008 (aka c++9) and std::hash is not available.

I need some advice on the approach.

panofish
  • 7,578
  • 13
  • 55
  • 96

4 Answers4

3

Why not std::hash?

#include <iostream>
#include <functional>
#include <string>

int main()
{
    int max = 100;

    std::string str = "Fred Smith";
    std::hash<std::string> hash_fn;

    int num = (int) hash_fn(str) % max;

    std::cout << num << '\n';
}

Output:

33

If you need a custom hashing algorithm implementation that will work across languages, I suggest starting here or here.

Gillespie
  • 5,780
  • 3
  • 32
  • 54
1

Two important elements in creating good hashes are hash table sizes and salting the hash (adding your own unpredictable touch). Typically there will be an operation of the given string to hash, maybe adding the ASCII values of each character or something like the length of the string being involved in some operation. These are EXTREMELY simple hashing examples for strings.

Now assuming we used an algorithm that uses the ascii values of each character in the string, we can incorporate the two elements I mentioned above to create our hash function like so...

int hash(string s, int tableSize)
{
    int sum = 0;
    for (int i = 0; i < s.length(); i++)
        sum += int(s[i]) * 3 //<- * 3 being my salt to the hash

    return sum % tableSize;
}

Using prime numbers when do table size modulus and salting is good practice because it'll reduce the risk of creating patterns in your hashes.

I hope this helps get you on the right track!

m_callens
  • 6,100
  • 8
  • 32
  • 54
  • While [multiplicative hashing](https://en.wikipedia.org/wiki/Universal_hashing#Hashing_strings) is not considered particularly good, this is particularly bad - in the function implemented as well as in coding it: an odd number no smaller than (number of values representable in a machine word)/(number of possible key element values) is supposed to be applied to the hash value before the current key element. A [salt](https://en.wikipedia.org/wiki/Salt_%28cryptography%29) is used to protect hashed values from dictionary attacks, this is not commonly regarded part of the hash function. – greybeard Nov 19 '15 at 18:07
  • 1
    Valid point greybeard. But, in my use the hash is simply used to generate a distinct color value between 0-360 based on a string. – panofish Nov 19 '15 at 21:47
1
#Very simple minded hash
def hashval(str, siz):
    hash = 0
    # Take ordinal number of char in str, and just add
    for x in str: hash += (ord(x))
    return(hash % siz) # Depending on the range, do a modulo operation.

print(hashval('stack', 33))
balabhi
  • 641
  • 5
  • 5
  • Since I am using vs2008, hash is not available. I like this simple minded approach for my use case. Thanks – panofish Nov 19 '15 at 21:29
0

A hashmap in java uses the hashing function of the object to get a 32 byte hash and a second hashing function implemented by the hashmap implementation to further reduce the length of the hash. This is explained in the anwser of this question: What hashing function does Java use to implement Hashtable class?

You can look at the hashing function used by the HashMap implementation since it can generate hashed of desired length.

You could just take the integer representation of each character of the string and compute the sum modulo max. Where max+1 is the highest value you want the hash to be.

Edit:

This HASH is reversible easily so it depends on your requirements.

Silver
  • 1,075
  • 3
  • 12
  • 37