I need some ideas to develop a good hashing function for my assignment. I have a list of all the countries in the world (around 190) in total. The names of each country is the key for the hashing function. Is there a specific kind of hashing function anyone would recommend to store this data in a hashing function without many collisions? Also, can you perhaps give an example of how to implement it?
-
A similar question that might be of some help (java instead of c, but same principles apply) http://stackoverflow.com/questions/2624192/good-hash-function-for-strings – cobbal Apr 24 '11 at 23:27
-
4ISO country codes - fits in an integer, guaranteed to be collision-free – Erik Apr 24 '11 at 23:27
-
1@Erik: Very clever; if you made it an answer I'd upvote it. – John Zwinck Apr 24 '11 at 23:28
-
@John: I don't think that really answers the OPs question though, I'll leave it a comment :P – Erik Apr 24 '11 at 23:29
-
You could write a pretty basic hashing function then write a test script (in your favorite scripting language) to see if there are any collisions. If you only have ~200 items in your data set, I doubt you have much to worry about. – Chris Lutz Apr 24 '11 at 23:35
-
1Are the country names in English and ASCII? – Mikel Apr 25 '11 at 00:00
3 Answers
Use GNU gperf. For inputs like yours, it will generate C code for you which implements a perfect hash function (for the given inputs). No collisions, no worries.

- 239,568
- 38
- 324
- 436
You can use generated perfect hash for that (GNU perf).
Of if the set of strings is dynamic then you can use ternary trie. For N unique strings it will give you unique number [1..N]. For your case it will be faster than with hash tables. Here is my implementation of such thing: http://code.google.com/p/tiscript/source/browse/trunk/tool/tl_ternary_tree.h

- 26,734
- 7
- 59
- 86
-
1You should also point to [tries](http://en.wikipedia.org/wiki/Trie), the ternary tree's easier to implement cousin. – hugomg Apr 25 '11 at 00:26
The simplest approach I can think of is for each country's name to compute the sum of the ASCII values in its representation and use this as the hash value:
int hash(const char *s)
{
int h = 0;
while (s && *s)
h += *s++;
return h;
}
If your hash map has size N, you store country names with map[hash(my_country) % N] = my_country
. Conceptually.
Just try this approach and see whether the resulting hash values are sufficiently uniformly distributed. Note that the quality of the distribution may also depend on N.

- 5,795
- 3
- 33
- 68