0

In school my teacher went over high performance spelling checking that uses a numeric hash, or key that represents a word. So instead of words, the keys are stored. Then the word to check is converted to its unique number using the same algorithm that was used on the dictionary. But I can't remember what this method is called, and I need to write a similar method.

Anyone know about this method to generate a unique number for a set of chars?

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
netcat
  • 1,281
  • 1
  • 11
  • 21
  • ps, this is not homework. I am using this for a project at work. – netcat Apr 01 '16 at 10:31
  • 2
    This is called a [hash](https://en.wikipedia.org/wiki/Hash_function), which suggests [What's a good hash function for English words?](http://stackoverflow.com/q/7700400/2564301) may answer you. – Jongware Apr 01 '16 at 10:35
  • each word, if you were to convert every character of that word into an int, would give you an unique number for each word...for example--to convert 'a' into an int, you simply say (int) a. – jimmu Apr 01 '16 at 10:40
  • Thanks rad and spock, that's the phrase I could not remember, the 'hash of a word. – netcat Apr 01 '16 at 14:30

1 Answers1

1

Actually standard c++ library has a hash template structure for that:

#include <iostream>
#include <functional>

int main() {
    std::string str = "Programmer";
    std::size_t str_hash = std::hash<std::string>{}(str);
    std::cout << str_hash ;
    return 0;
}

Would output 2561445211.

"std::hash{}(str)" computes the hash value;

047
  • 66
  • 1
  • 3
  • Thanks mister mini-james bond, 047 ;) – netcat Apr 01 '16 at 14:30
  • I did not know that the standard library already has this built in. My idea is to make a hash of every word in the dictionary, and then hash the word to be checked, b/c this would be quicker to find the correct word in a sorted list of hashed numbers, than to match to a string in a list. – netcat Apr 01 '16 at 14:32
  • @MattFomich You're welcome. Standard lib has it since 2011 version of C++ (called... C++11). Your spell checker problem is a good example of a more general set intersection problem, where in your case words are elements of a set. Take a look at standard lib's hash set: [std::unordered_set dictionary](http://en.cppreference.com/w/cpp/container/unordered_set). It's [count(string word)](http://en.cppreference.com/w/cpp/container/unordered_set/count) method will check if word is in the set in constant time, while as "list of hashed numbers" will take linear time. – 047 Apr 02 '16 at 21:13
  • Hey, your code actually does not compile. I get a 'type name not allowed' error. This works: std::string str = "Programmer"; std::hash hash_fn; size_t str_hash = hash_fn(str); std::cout << str_hash << '\n'; – netcat Apr 02 '16 at 23:10
  • Thanks for the info on unordered set. That is what I will try. I have a spell checker that uses a hash table, but for checking as you type, it looks like an unordered set will be the fastest way to verify the word is correct. Do you know if a hash table is as fast as an unordered set? – netcat Apr 03 '16 at 00:39
  • @MattFomich Actually, unordered_set **is** a hash talbe. It's called so just to emphasize that when one iterates over uordered_set from it's first element to the last one (using std::begin and std::end), elements won't be returned in the order you put them in, unlike with std::set<>. – 047 Apr 03 '16 at 12:27