I'm working on an English words app right now, I want every word has a different int id, as all words are different from one another, I think they simply can be assigned an integer (or long?) easily.
I don't want to give them ids serially according to alphabetical order. I think there may be an existing algorithm for this very requirement, I don't want to invent my own wheels, so, please help me.
I prefer integer id because I want the structure to be compact and small enough for transfer over the internet, because one word list might contain hundreds and thousands of words.
Imagine I have data structure as bellow:
struct word {
int wordId;
byte familiarity;
}
// I prefer the mapping like this
apple -> 0x1, 0x4
app -> 0x2E, 0x2
ape -> 0xEA, 0x1
UPDATE:
Okay, what I'm trying to do is to provide the users several wordlists, each of which contains a couple of words, chances are the user already learn some of the words(eg. apple), so he/she wants to skip those words, and hope they'll never show up again. So, I want to enable the user to skip those words, and the selected words will be sent to the server or kept in a local file, it might be unnecessary to send the whole word or phrase. I've found a question here: http://stackoverflow.com/questions/7700400/whats-a-good-hash-function-for-english-words, do you have any better solutions?