I have to write my own hash function. If I wanted to just make the simple hash function that maps each letter in the string to a numerical value (i.e. a=1, b=2, c=3, ...), is there a way I can perform this hash on a string without having to first convert it to a c-string to look at each individual char? Is there a more efficient way of hashing strings?
10 Answers
From personal experience I know that this works and produces good distributions. (Plagiarised from http://www.cse.yorku.ca/~oz/hash.html):
djb2
this algorithm (k=33) was first reported by dan bernstein many years ago in comp.lang.c. another version of this algorithm (now favored by bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of number 33 (why it works better than many other constants, prime or not) has never been adequately explained.
unsigned long hash(unsigned char *str) {
unsigned long hash = 5381;
int c;
while (c = *str++) {
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
}
return hash;
}

- 2,285
- 18
- 30

- 10,023
- 5
- 61
- 77
Re the first question, sure, e.g, something like:
int hash = 0;
int offset = 'a' - 1;
for(string::const_iterator it=s.begin(); it!=s.end(); ++it) {
hash = hash << 1 | (*it - offset);
}
regarding the second, there are many better ways to hash strings. E.g., see here for a few C examples (easily translatable to C++ along the lines of the snippet above).

- 55,379
- 16
- 141
- 208

- 854,459
- 170
- 1,222
- 1,395
-
i see. how about if i wanted to do case-insensitive hashing. where A=a=1? – zebraman Mar 29 '10 at 01:35
-
+1, if only for use of `*2` and `|` to create a comedicaly poor hash ;-) – Steve Jessop Mar 29 '10 at 01:40
-
3-1 for creating a comically poor hash. Use '^', never '|'! Even with '^', this will create a poor distribution (lots more collisions than you need) with short strings. – Tim Cooper Nov 21 '12 at 05:47
Here's a C (++) hash function that I found in Stroustrup's book:
int hash(const char *str)
{
int h = 0;
while (*str)
h = h << 1 ^ *str++;
return h;
}
If you're using it for a hash table (which Stroustrup does) then you can instead return the abs of the hash modulo a prime number. So instead
return (h > 0 ? h : -h) % N_BUCKETS;
for the last line.

- 59
- 1
- 1
-
3If `h` is `INT_MIN`, evaluating `-h` results in undefined behavior. Better use unsigned numbers for hashing. – fredoverflow Nov 21 '12 at 09:00
You can examine each individual char from a std::string using the []
operator. However, you can look at Boost::Functional/Hash for guidance on a better hashing scheme. There is also a list of hashing functions in c located here.

- 35,646
- 15
- 94
- 131
-
so, my understanding is that hash functions map a string to an int, but usually these ints are mapped using a compression map to table addresses so that the hashtable is a more manageable size. is this applicable to the hash functions you recommended in the link? – zebraman Mar 29 '10 at 01:21
-
You mean buckets? There are a number of "usual" functions which are trade-offs in terms of size of the hash table produced and performance criteria. The biggest concern you should have is how many repeated values, that is, how uniformly distributed your results are. Poor hashing will invariably leave you with a small collection of linked lists rather than a constant amortized time look up table. I have not examined the later while I've seen Boost. Did I answer that? – wheaties Mar 29 '10 at 01:29
C++11 ships with a standard hashing function for strings.
https://en.cppreference.com/w/cpp/string/basic_string/hash
#include <string>
#include<functional> // hash
int main(){
std::string s = "Hello";
std::size_t hash = std::hash<std::string>{}(s);
}

- 14,261
- 4
- 67
- 118
Just posting an improvement to Arnestig's djb2 algorithm to be constexpr-friendly. I had to remove the unsigned qualifier of the argument so it can work with literal strings.
constexpr unsigned long hash(const char *str) {
unsigned long hash = 5381;
while (int c = *str++) {
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
}
return hash;
}

- 151
- 1
- 10
xor the characters together, four at a time.

- 47,994
- 7
- 61
- 70
-
-
xor is a bitwise operator meaning "one-but-not-both", the '^' operator in c++. e.g. 0 ^ 1 => 1 1 ^ 1 => 0 3 ^ 1 => 2 (11 ^ 01 => 10) It'll give you a randomish integer value. Either way, you'll need to traverse the string in a way similar to Alex Martelli's solution. So go with that and you don't need to worry about word size. :) – Stephen Mar 29 '10 at 01:22
-
2That's not a great hash function. For example, on ASCII data it won't touch the 8th, 16th, 24th or 32nd bits of the word at all. As a practical effect, if your hashtable has 512 buckets, then half of them would never be used by ASCII strings. You want to introduce some co-prime numbers somewhere along the line, and restricting the bucket count to compensate for a weakness in the hash just isn't necessary given the availability of better hashes that aren't much slower. – Steve Jessop Mar 29 '10 at 01:34
-
Fair point. I hadn't intended this to be a good hash function, just a simple hash function. There are plenty of better hashing algorithms described by the links in other answers. I had assumed (perhaps mistakenly) that hash
wasn't available and the question didn't really ask for performance or hash quality. I should have stated that explicitly. – Stephen Mar 29 '10 at 01:43 -
This hash function will collide on e.g. "abcd1234" and "1234abcd". More seriously, it'll produce bad distributions. – Tim Cooper Nov 21 '12 at 05:50
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
// a variation on dan bernstein's algorithm
// [http://www.cse.yorku.ca/~oz/hash.html]
template<typename Int>
struct hash {
hash() : acc(5381) { }
template<typename Ch>
void operator()(Ch ch) { acc = ((acc << 5) + acc) ^ ch; }
operator Int() const { return acc; }
Int acc;
};
int main(int argc, char* argv[])
{
string s("Hellp, world");
cout << hex << showbase
<< for_each(s.begin(), s.end(), hash<unsigned long long>()) << '\n';
return 0;
}

- 57,473
- 20
- 96
- 131
Another way for small strings:
int hash(const char* str) {
int hash = 0;
int c = 0;
while (c < std::strlen(str)) {
hash += (int)str[c] << (int)str[c+1];
c++;
}
return hash;
}

- 83
- 1
- 7
You can make use of the member functions operator[] or at of the string class or iterators to access individual char of a string object without converting it to c-style char array.
To hash a string object to an integer you'll have to access each individual char of the string object which you can do as:
for (i=0; i < str.length(); i++) {
// use str[i] or str.at(i) to access ith element.
}

- 445,704
- 82
- 492
- 529
-
Don't call `str.length()` on each for iteration, especially for hashing strings that don't change during the loop. Also, consider working directly on the `str.c_str()` to avoid any function call in this. Strings do end in `NULL` character. – CodeAngry Apr 30 '14 at 14:10