Trie implementation

Question

Is there any speed- and cache-efficient implementations of trie in C/C++? I know what a trie is, but I don't want reinvent the wheel, implementing it myself.

https://code.google.com/p/simple-trie/ – ahmedsafan86 Apr 14 '13 at 09:16 — ahmedsafan86, Apr 14 '13 at 09:16

score 38 · Accepted Answer · edited Oct 02 '15 at 07:30

38

if you are looking for an ANSI C implementation you can "steal" it from FreeBSD. The file you are looking for is called radix.c. It's used for managing routing data in kernel.

edited Oct 02 '15 at 07:30

cnst

25,870
6
90
122

answered Jun 24 '09 at 09:35

SashaN

679
4
8

@SashaN Link doesn't work. – Dannyboy Jun 12 '14 at 13:37
2

try this link for radix.c : http://www.opensource.apple.com/source/xnu/xnu-1456.1.26/bsd/net/radix.c – Viren Jul 16 '14 at 23:41

eloj · Answer 2 · 2016-07-08T19:14:33.470

I realize the question was about ready implementations, but for reference...

Before you jump on Judy you should have read "A Performance Comparison of Judy to Hash Tables". Then googling the title will likely give you a lifetime of discussion and rebutals to read.

The one explicitly cache-conscious trie I know of is the HAT-trie.

The HAT-trie, when implemented correctly, is very cool. However, for prefix search you need a sorting step on the hash buckets, which somewhat clashes with the idea of a prefix structure.

A somewhat simpler trie is the burst-trie which essentially gives you an interpolation between a standard tree of some sort (like a BST) and a trie. I like it conceptually and it's much easier to implement.

score 7 · Answer 3 · answered Jun 24 '09 at 05:21

7

I've had good luck with libTrie. It may not be specifically cache optimized but the performance has always been decent for my applications.

answered Jun 24 '09 at 05:21

SPWorley

11,550
9
43
63

score 5 · Answer 4 · answered May 17 '11 at 19:20

5

GCC ships with a handful of data structures as part of its "Policy-based data structures". This includes a few trie implementations.

http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/trie_based_containers.html

answered May 17 '11 at 19:20

tgoodhart

3,111
26
37

Could you, please, provide a sample with `#include`? Thank you for the answer. – Sergei Krivonos Sep 14 '16 at 18:38
Sample use of trie https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/testsuite/ext/pb_ds/example/trie_prefix_search.cc – mosh Jul 09 '17 at 10:53

nik · Answer 5 · 2009-06-24T05:34:12.933

4

References,

A Double-Array Trie implementation article (includes a C implementation)
TRASH - A dynamic LC-trie and hash data structure -- (a 2006 PDF reference describing a dynamic LC-trie used in the Linux kernel to implement address lookup in the IP routing table

edited Jun 24 '09 at 05:34

answered Jun 24 '09 at 05:28

nik

13,254
3
41
57

score 4 · Answer 6 · answered Jun 24 '09 at 11:33

4

Judy arrays: Very fast and memory efficient ordered sparse dynamic arrays for bits, integers and strings. Judy arrays are faster and more memory efficient than any binary-search-tree (incl. avl & red-black-trees).

answered Jun 24 '09 at 11:33

bill

1,321
11
9

score 3 · Answer 7 · answered Feb 27 '15 at 05:44

3

Cedar, HAT-Trie, and JudyArray is quite awesome, you can find the benchmark here.

benchmark result

answered Feb 27 '15 at 05:44

Kokizzu

24,974
37
137
233

score 2 · Answer 8 · answered Jan 10 '11 at 20:02

2

You can also try TommyDS at http://tommyds.sourceforge.net/

See the benchmarks page on the site for a speed comparison with nedtries and judy.

answered Jan 10 '11 at 20:02

amadvance

61
1

(FYI I'm the author of nedtries) Note that the benchmarks above used the C version of nedtries. The C++ version is about 15% faster and if I'm reading the graph right, would be only slightly slower than TommyDS's version if built as C++. That said, he has far lower overhead per node than I do. I deliberately go overboard in metadata to enable really deep assertion checking during debug operation :) – Niall Douglas Apr 17 '12 at 14:17

score 1 · Answer 9 · answered Jun 24 '09 at 10:09

Cache optimizations are something you'll probably are going to have to do, because you'll have to fit the data into a single cacheline which generally is something like 64 bytes (which will probably work if you start combining data, such as pointers). But it's tricky :-)

Nate · Answer 10 · 2014-10-21T09:21:11.620

0

Burst Trie's seem to be a bit more space efficient. I'm not sure how much cache-efficiency you can get out of any index since CPU-caches are so tiny. However, this kind of trie is plenty compact enough to keep large data sets in RAM (where a regular Trie would not).

I wrote a Scala implementation of a burst trie that also incorporates some space-saving techniques that I found in GWT's type-ahead implementation.

https://github.com/nbauernfeind/scala-burst-trie

edited Oct 21 '14 at 09:21

answered Apr 29 '14 at 18:48

Nate

2,205
17
21

My downvote is unintentional and is now locked by StackOverflow. I do not know how to undo it. Sorry. – Roman Mishin Mar 20 '16 at 09:34

score 0 · Answer 11 · answered Oct 25 '16 at 10:02

0

I've had very good results (very good balance between performance and size) with marisa-trie compared to several TRIE implementations mentioned with my data set.

https://github.com/s-yata/marisa-trie/tree/master

answered Oct 25 '16 at 10:02

benjist

2,740
3
31
58

Shadi Serhan · Answer 12 · 2017-04-17T22:05:31.170

Sharing my "fast" implementation for Trie, tested just in basic scenario:

#define ENG_LET 26

class Trie {
    class TrieNode {
    public:
        TrieNode* sons[ENG_LET];
        int strsInBranch;
        bool isEndOfStr;

        void print() {
            cout << "----------------" << endl;
            cout << "sons..";
            for(int i=0; i<ENG_LET; ++i) {
                if(sons[i] != NULL)
                    cout << " " << (char)('a'+i);
            }
            cout << endl;
            cout << "strsInBranch = " << strsInBranch << endl;
            cout << "isEndOfStr = " << isEndOfStr << endl;
            for(int i=0; i<ENG_LET; ++i) {
                if(sons[i] != NULL)
                    sons[i]->print();
            }

        }
        TrieNode(bool _isEndOfStr = false):isEndOfStr(_isEndOfStr), strsInBranch(0) {
            for(int i=0; i<ENG_LET; ++i) {
                sons[i] = NULL;
            }
        }
        ~TrieNode() {
            for(int i=0; i<ENG_LET; ++i) {
                delete sons[i];
                sons[i] = NULL;
            }
        }
    };

    TrieNode* head;
public:
    Trie() { head = new TrieNode();}
    ~Trie() { delete head; }
    void print() {
        cout << "Preorder Print : " << endl;
        head->print();
    }
    bool isExists(const string s) {
        TrieNode* curr = head;
        for(int i=0; i<s.size(); ++i) {
            int letIdx = s[i]-'a';
            if(curr->sons[letIdx] == NULL) {
                return false;
            }
            curr = curr->sons[letIdx];
        }

        return curr->isEndOfStr;
    }
    void addString(const string& s) {
        if(isExists(s))
            return;
        TrieNode* curr = head;
        for(int i=0; i<s.size(); ++i) {
            int letIdx = s[i]-'a';
            if(curr->sons[letIdx] == NULL) {
                curr->sons[letIdx] = new TrieNode();    
            }
            ++curr->strsInBranch;
            curr = curr->sons[letIdx];
        }
        ++curr->strsInBranch;
        curr->isEndOfStr = true;
    }
    void removeString(const string& s) {
        if(!isExists(s))
            return;
        TrieNode* curr = head;
        for(int i=0; i<s.size(); ++i) {
            int letIdx = s[i]-'a';

            if(curr->sons[letIdx] == NULL) {
                assert(false);
                return; //string not exists, will not reach here
            }
            if(curr->strsInBranch==1) {    //just 1 str that we want remove, remove the whole branch
                delete curr;
                return;
            }
            //more than 1 son
            --curr->strsInBranch;
            curr = curr->sons[letIdx];
        }
        curr->isEndOfStr = false;
    }

    void clear() {
        for(int i=0; i<ENG_LET; ++i) {
            delete head->sons[i];
            head->sons[i] = NULL;
        }
    }

};

Trie implementation

12 Answers12

Linked