How do I modify the procedure in Ukkonen's paper to hold a value for number of times a word appears in the text. Are there any such implementations available that provide the string frequency as well?
The modification I want is like for a string "hehe" the frequency count for all "h","e","he" should be 2 in the tree. Rest nodes have the default value 1.
I found some libraries like the best so far and some previous questions like this.
But none of them describes a good enough solution to my problem. Also I have to process a very large dictionary file (about a billion words). Then the algorithm needs to be really fast. And I am ready to compromize on space a bit.