The trie specified is suboptimal in numerous ways.
- For a start, it constructs multiple nodes per item inserted. As the author writes, "Every character of input key is inserted as an individual trie node." That's a horrible, and unnecessary penalty! The use of an
ALPHABET_SIZE
greater than 2 adds insult to injury here; not only would a phrase of fifty bytes require fifty nodes, but each node would likely be over one hundred bytes in size... Each item or "phrase" of fifty bytes in length might require up to about 5KB of storage using that code! That's not even the worst of it.
- The algorithm provided embeds
malloc
internally, making it quite difficult to optimise. Each node is its own allocation, making insert
very malloc
-heavy. Allocation details should be separated from data structure processing, if not for the purpose of optimisation then for simplicity of use. Programs that use this code heavily are likely to run into performance issues related to memory fragmentation and/or cache misses, with no easy or significant optimisation in sight except for substituting the trie for something else.
- That's not the only thing wrong here... This code isn't too portable, either! If you end up running this on an old (not that old; they do still exist!) mainframe that uses EBCDIC rather than ASCII, this code will produce buffer overflows, and the programmer (you) will be called in to fix it.
<sarcasm>
That's so optimal, right?</sarcasm>
I've written a PATRICIA trie implementation that uses exactly one node per item, an alphabet size of two (it uses the bits of each character, rather than each character) and allows you to use whichever allocation you wish... alas, I haven't yet put a lot of effort into refactoring its interface, but it should be fairly close to optimal. You can find that implementation here. You can see examples of inserting (using patricia_add
), retrieving (using patricia_get
) and removing (using patricia_remove
) in the patricia_test.c testcase file.