How can I hash a string to an int using c++?

Question

I have to write my own hash function. If I wanted to just make the simple hash function that maps each letter in the string to a numerical value (i.e. a=1, b=2, c=3, ...), is there a way I can perform this hash on a string without having to first convert it to a c-string to look at each individual char? Is there a more efficient way of hashing strings?

score 9 · Answer 1 · edited Sep 02 '13 at 14:34

From personal experience I know that this works and produces good distributions. (Plagiarised from http://www.cse.yorku.ca/~oz/hash.html):

djb2

this algorithm (k=33) was first reported by dan bernstein many years ago in comp.lang.c. another version of this algorithm (now favored by bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of number 33 (why it works better than many other constants, prime or not) has never been adequately explained.

unsigned long hash(unsigned char *str) {
    unsigned long hash = 5381;
    int c;

    while (c = *str++) {
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    }

    return hash;
}

score 7 · Accepted Answer · edited Nov 05 '12 at 08:38

7

Re the first question, sure, e.g, something like:

int hash = 0;
int offset = 'a' - 1;
for(string::const_iterator it=s.begin(); it!=s.end(); ++it) {
  hash = hash << 1 | (*it - offset);
}

regarding the second, there are many better ways to hash strings. E.g., see here for a few C examples (easily translatable to C++ along the lines of the snippet above).

edited Nov 05 '12 at 08:38

masoud

55,379
16
141
208

answered Mar 29 '10 at 01:12

Alex Martelli

854,459
170
1,222
1,395

i see. how about if i wanted to do case-insensitive hashing. where A=a=1? – zebraman Mar 29 '10 at 01:35
+1, if only for use of `*2` and `|` to create a comedicaly poor hash ;-) – Steve Jessop Mar 29 '10 at 01:40
3

-1 for creating a comically poor hash. Use '^', never '|'! Even with '^', this will create a poor distribution (lots more collisions than you need) with short strings. – Tim Cooper Nov 21 '12 at 05:47

score 5 · Answer 3 · answered Aug 05 '12 at 19:38

5

Here's a C (++) hash function that I found in Stroustrup's book:

int hash(const char *str)
{
    int h = 0;
    while (*str)
       h = h << 1 ^ *str++;
    return h;
}

If you're using it for a hash table (which Stroustrup does) then you can instead return the abs of the hash modulo a prime number. So instead

    return (h > 0 ? h : -h) % N_BUCKETS;

for the last line.

answered Aug 05 '12 at 19:38

Wren

59
1
1

3

If `h` is `INT_MIN`, evaluating `-h` results in undefined behavior. Better use unsigned numbers for hashing. – fredoverflow Nov 21 '12 at 09:00

score 5 · Answer 4 · answered Mar 29 '10 at 01:09

5

You can examine each individual char from a std::string using the [] operator. However, you can look at Boost::Functional/Hash for guidance on a better hashing scheme. There is also a list of hashing functions in c located here.

answered Mar 29 '10 at 01:09

wheaties

35,646
15
94
131

so, my understanding is that hash functions map a string to an int, but usually these ints are mapped using a compression map to table addresses so that the hashtable is a more manageable size. is this applicable to the hash functions you recommended in the link? – zebraman Mar 29 '10 at 01:21
You mean buckets? There are a number of "usual" functions which are trade-offs in terms of size of the hash table produced and performance criteria. The biggest concern you should have is how many repeated values, that is, how uniformly distributed your results are. Poor hashing will invariably leave you with a small collection of linked lists rather than a constant amortized time look up table. I have not examined the later while I've seen Boost. Did I answer that? – wheaties Mar 29 '10 at 01:29

score 1 · Answer 5 · answered Jul 08 '18 at 22:04

C++11 ships with a standard hashing function for strings.

https://en.cppreference.com/w/cpp/string/basic_string/hash

#include <string>
#include<functional> // hash
int main(){
    std::string s = "Hello";
    std::size_t hash = std::hash<std::string>{}(s);
}

score 1 · Answer 6 · answered Jun 28 '20 at 21:49

Just posting an improvement to Arnestig's djb2 algorithm to be constexpr-friendly. I had to remove the unsigned qualifier of the argument so it can work with literal strings.

constexpr unsigned long hash(const char *str) {
    unsigned long hash = 5381;

    while (int c = *str++) {
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    }

    return hash;
}

score 0 · Answer 7 · answered Mar 29 '10 at 01:10

0

xor the characters together, four at a time.

answered Mar 29 '10 at 01:10

Stephen

47,994
7
61
70

i don't really understand what xor is/does. could you explain? – zebraman Mar 29 '10 at 01:13
xor is a bitwise operator meaning "one-but-not-both", the '^' operator in c++. e.g. 0 ^ 1 => 1 1 ^ 1 => 0 3 ^ 1 => 2 (11 ^ 01 => 10) It'll give you a randomish integer value. Either way, you'll need to traverse the string in a way similar to Alex Martelli's solution. So go with that and you don't need to worry about word size. :) – Stephen Mar 29 '10 at 01:22
2

That's not a great hash function. For example, on ASCII data it won't touch the 8th, 16th, 24th or 32nd bits of the word at all. As a practical effect, if your hashtable has 512 buckets, then half of them would never be used by ASCII strings. You want to introduce some co-prime numbers somewhere along the line, and restricting the bucket count to compensate for a weakness in the hash just isn't necessary given the availability of better hashes that aren't much slower. – Steve Jessop Mar 29 '10 at 01:34
Fair point. I hadn't intended this to be a good hash function, just a simple hash function. There are plenty of better hashing algorithms described by the links in other answers. I had assumed (perhaps mistakenly) that hash wasn't available and the question didn't really ask for performance or hash quality. I should have stated that explicitly. – Stephen Mar 29 '10 at 01:43
This hash function will collide on e.g. "abcd1234" and "1234abcd". More seriously, it'll produce bad distributions. – Tim Cooper Nov 21 '12 at 05:50

score 0 · Answer 8 · answered Apr 14 '10 at 08:30

#include <iostream>
#include <string>
#include <algorithm>

using namespace std;

// a variation on dan bernstein's algorithm
// [http://www.cse.yorku.ca/~oz/hash.html]
template<typename Int>
struct hash {
    hash() : acc(5381) { }
    template<typename Ch>
    void operator()(Ch ch) { acc = ((acc << 5) + acc) ^ ch; }
    operator Int() const { return acc; }
    Int acc;
};

int main(int argc, char* argv[])
{
    string s("Hellp, world");
    cout << hex << showbase
        << for_each(s.begin(), s.end(), hash<unsigned long long>()) << '\n';
    return 0;
}

score 0 · Answer 9 · answered Dec 07 '15 at 08:04

0

Another way for small strings:

int hash(const char* str) {
    int hash = 0;
    int c = 0;

    while (c < std::strlen(str)) {
        hash += (int)str[c] << (int)str[c+1];
        c++;
    }
    return hash;
}

answered Dec 07 '15 at 08:04

LUCAS

83
1
7

score -2 · Answer 10 · answered Mar 29 '10 at 01:10

-2

You can make use of the member functions operator[] or at of the string class or iterators to access individual char of a string object without converting it to c-style char array.

To hash a string object to an integer you'll have to access each individual char of the string object which you can do as:

for (i=0; i < str.length(); i++) {
    // use str[i] or str.at(i) to access ith element.
}

answered Mar 29 '10 at 01:10

codaddict

445,704
82
492
529

Don't call `str.length()` on each for iteration, especially for hashing strings that don't change during the loop. Also, consider working directly on the `str.c_str()` to avoid any function call in this. Strings do end in `NULL` character. – CodeAngry Apr 30 '14 at 14:10

How can I hash a string to an int using c++?

10 Answers10

Linked