This question is language independent and is more about understanding how to implement the trie or whether tries would be appropriate for what my program is suppose to do. Say I have a string of text like this.
string= "a tale about an ant and an android";
The corresponding trie for "a" looks like this
a(7)
/ \
b(1) n(4)
/ / \
o(1) t(1) d(2)
/ \
u(1) r(1)
/ \
t(1) o(1)
\
i(1)
\
d(1)
and I want to find the number of occurrences for each word. Although "a" appears 6 times in the text there is only one instance where it is used as a word. The same rule applies for "an" & "and".
I want my final frequency counter to look like this:
a: occurs 1 time not 7 an: 2 and: 1 and so on..
How is it possible for me to record the counts of full words?
I'm working in php trying to process a load of text and have visited this question and it is not what I am looking for. Performance is important but memory efficient is more preferable since I am parsing say a trillion words. Thanks and I appreciate your input.