how to search for a given word from a huge database?

Question

What's the most efficient method to search for a word from a dictionary database. I searched for the answer and people had suggested to use trie data structure. But the strategy for creating the tree for a huge amount of words would be to load the primary memory. I am trying to make an android app which involves this implementation for my data structure project. So could anyone tell me how do the dictionary work.

Even when I use the t9 dictionary in my phone, the suggestions for words appear very quickly on the screen. Curious to know the algorithm and the design behind it.

This may be helpful to get to know T9 [how does T9 work](http://stackoverflow.com/questions/2574016/data-structure-behind-t9-type-of-dictionary) — Mukul Goel, Mar 19 '13 at 09:04
@MukulGoel Thanx.. found your link helpful. But yet to test whether i'll be able to implement it.. Still learnt something new from that .. Thanx :) — Ashwin Surana, Mar 19 '13 at 11:00
@Ansh no never read about it. Anyways will check out thanx :) — Ashwin Surana, Mar 19 '13 at 13:57

Saeed Amiri · Accepted Answer · 2013-03-19T09:37:44.510

8

You can use Trie which is most usefull for searching big dictionaries. Because too many words are using similar startup, trie brgins around constant factor search also you can use in place, with limited number of access to physical memory. You can find lots of implementation in the web.

If someone is not familiar with trie, I think this site is good and I'm just quoting their sample here:

A trie (from retrieval), is a multi-way tree structure useful for storing strings over an alphabet. It has been used to store large dictionaries of English (say) words in spelling-checking programs and in natural-language "understanding" programs. Given the data:
an, ant, all, allot, alloy, aloe, are, ate, be 
the corresponding trie would be:

This is good practical Trie implementation in java: http://code.google.com/p/google-collections/issues/detail?id=5

edited Mar 19 '13 at 09:37

answered Mar 19 '13 at 09:05

Saeed Amiri

22,252
5
45
83

But creating a trie of 10,000 words could be an issue in an android app as i have mentioned in my question. Well my friends said that loading trie for these many words would make the mobile to force quit the app :| .. – Ashwin Surana Mar 19 '13 at 10:49
@AcesSmart, First of all you said your friend suggested you to use "tree" but after one hour when you see answer and comments, you changed it to "trie", this is cheating and also new question. Also because you are not familiar with "trie" you thinking so, This is the thing which works everywhere, is too much smaller than your friend "tree" approach, as I said in my answer you can use it "in place", means without loading in memory, lots of search engines are using the "trie", and seems you are the first one in the world says that it is not applicable in your mobile app. – Saeed Amiri Mar 19 '13 at 11:36
Also if your question had some upvote was because you mentioned your friend suggested "tree" approach, but in the case he/she suggested "trie" approach and still you have a question, this is funny question, I'm pretty sure you didn't test it. (remember that your edit is available in history, so you can not change your question totally, this makes also lots of change for reader of my answer, they will say why I answered in this way for this question, but you can ask new question) – Saeed Amiri Mar 19 '13 at 11:39
I wrote trie.. Some had edited the question to tree :| . Well sorry for the inconvenience .. – Ashwin Surana Mar 19 '13 at 13:56
@AcesSmart, I don't know what was your problem, if you think about trie, edit this part of your question: "But the strategy for creating the **tree** for a huge amount of words would be to load the primary memory", change it to "trie", and in this case, you are wrong, as I stated before (2 times, in answer and comment), first of all you have no memory problem with Trie, second even if you think so, the main advantage of the trie is possibility to do **in-place** search, means without fetching in **primary memory**. – Saeed Amiri Mar 19 '13 at 14:03
"you can use it "in place", means without loading in memory, lots of search engines are using the "trie"." Could you please explain this. And i was not against trie but i was against loading all the words in to the primary memory(which would slow down the app) .. And seriously man you have got me wrong. i don't car eabout the upvote i get. I am here just for a solution . Please don't get me wrong. I had posted the question with the knowledge of trie. Even before posting my question i had checked for more solutions in the stack overflow itself and they had mentioned about trie. – Ashwin Surana Mar 19 '13 at 14:16
could you just hint me on "without fetching in primary memory".. I don't know how to implement that. – Ashwin Surana Mar 19 '13 at 14:28
@AcesSmart e.g See the wiki link: http://en.wikipedia.org/wiki/Trie#Advantages_relative_to_other_search_algorithms, comparison is on in-place implementation (first chart), one of a main point of trie is the memory usage and ability to use it in-place, if you implement trie yourself, you can see how is easy to searching from big file just by in place search, instead of searching whole file, also for dictionaries, trie is not comparable by tree, is very small and very fast. – Saeed Amiri Mar 19 '13 at 14:55
dictionary on mobile phone has around 10k words and by using trie implementation it takes something around 20KB memory, which is nothing in current mobile systems. because trie skips redundant letters and uses the common part. – Saeed Amiri Mar 19 '13 at 14:56
Cool.. Thanx for the help. :) . Wil try to implement it. – Ashwin Surana Mar 19 '13 at 15:33
@AcesSmart, At first step, I'd offer you to use one of a implementations, available in web, to see is enough for you or not, then if not, implement in-place one yourself. – Saeed Amiri Mar 19 '13 at 15:57
The solution works perfectly :) .... thanx for the help . created a trie for two hundred and thousand words and it took only 1.5 secs for the trie to be created :) ... Thanx man! – Ashwin Surana Mar 27 '13 at 04:50

score 0 · Answer 2 · answered Mar 19 '13 at 09:06

There are lots of ways to do that. The one that I used some time ago (which is especially good if you don't make changes to your dictionary) is to create a prefix index.

That is, you sort your entries lexicologicaly. Then, you save the (end) positions of the ranges for different first letters. That is, if your entries have indexes from 1 to 1000, and words "aardvark -- azerbaijan" take the range from 1 to 200, you make an entry in a separate table "a | 200", then you do the same for first and second letters. Then, if you need to find a particular word, you greatly reduce the search scope. In my case, the index on first two letters was quite sufficient.

Again, this method requires you to use a DB, like SQLite, which I think is present on the Android.

Ashwin Surana · Answer 3 · 2014-10-09T19:16:08.643

Using a trie is indeed space conscious, just realized when I checked my RAM usage after loading 150,000 words in to trie, the usage was 150 MB (Trie was implemented in C++).The memory consumption was hugely due to pointers. I ended up with ternary tries which had very less memory wastage around 30 MB (compared to 150 MB) but the time complexity had increased a bit. Another option is to use "Left child Right sibling " in which there is very less wastage of memory but time complexity is more than that of ternary trie.

how to search for a given word from a huge database?

3 Answers3