searching mobile phone address book

Question

Given a mobile phone with only a numeric keypad, we need to store the contacts in a way that will make the search fast.

User will punch in numbers and we have to surface all the contacts in the address book that begin with the letters corresponding to those numbers.

I was asked this in an interview and I suggested creating a trie. For each name in the address book, I suggested adding the corresponding number to the trie.

So, if the address book has the following contacts:

bob
boby
mat 
mav

I would create tries using the corresponding numbers. In this case, the trie will contain:

262     (At the 2nd node 2, keep a pointer to bob)
2629    (At the node 9, keep a pointer to boby)
628     (At the node 8, keep 2 pointers, one to each of mat & mav)

Are there any better approaches?

Update: This trie is used in T9 technology described here Data structure behind T9 type of dictionary

It would help if your example had phone numbers for bob, boby, etc. — James Black, Jan 28 '13 at 18:42
James, actual phone numbers don't matter. Users are searching for names not phone numbers — user674669, Jan 28 '13 at 19:23

score 0 · Answer 1 · answered Jan 29 '13 at 03:29

I suspect most names differentiate themselves within the first few characters (e.g., having "Theodore", "Theodor", "Theodora" in your list would constitute a distant outlier).

On that basis, you could use something much simpler than a trie, namely a hash table mapping prefixes to lists of matching entries (once a prefix uniquely determines a name in the list, you don't need to go further).

For example, given {bob, bobby, matt, mads, zed} you would have the hash table

"b" --> [bob, bobby]
"bo" --> [bob, bobby]
"bob" --> [bob, bobby]
"bobb" --> [bobby]
"m" --> [matt, mads]
"ma" --> [matt, mads]
"mat" --> [matt]
"mad" --> [mads]
"z" --> [zed]

Note that "non-differentiating" prefixes (e.g., "b", "bo", "bob") can share their value lists.

If the average common prefix is k characters, then your overhead is a factor of k hash table entries. If k is small, as I suspect, then you will end up with a leaner, simpler data structure than a trie.

James Black · Answer 2 · 2013-02-02T15:04:10.743

You could build a tree based on the letter, but it would need to be three values, left, right, list of phone numbers

So with your example:

                              root node

               b  (left node)                   m  (right node)
               o                                a
               b (number)             v                   t
               y (number)

You can then walk down the nodes to show auto-completion suggestions, as, in the case of bob and boby you can show both names if desired.

UPDATE

I thought about it a bit this morning, and this paper may give some fresh thoughts on how to approach this problem, as it uses a ternary tree for sorting of strings.

http://www.cs.tufts.edu/~nr/comp150fp/archive/bob-sedgewick/fast-strings.pdf

But, if the node in my example had 5 values, then you have:

left node
right node
down node
current letter
list of phone numbers that apply

Then search left or right until you find the correct letter in that position, and then go down, and then left or right until you find the next one.

This way you don't have 26 pointers for each letter in each node, so this tree would be sparse, but it would be unbalanced, most likely. Balancing it would be a different problem.

If I understand you correctly, wouldn't the tree need 26-way branching - one branch for each letter a-z? — user674669, Jan 30 '13 at 06:03
@user674669 - That would probably be the best approach. I didn't think through completely the implementation, as I have never had a need to do this. I expect you could do it with 4 values if you also have an integer for the order of that letter. There can be several approaches to this. — James Black, Feb 02 '13 at 14:56

searching mobile phone address book

2 Answers2