If we have N
words, each with maximum length L
, your algorithm will take O(N*L^3)
(supposing that adding to trie is linear with length of adding word). However, the size of the resulting trie (number of nodes) is at most O(N*L^2)
, so it seems you are wasting time and you could do better.
And indeed you can, but you have to pull a few tricks from you sleeve. Also, you will no longer need the trie.
.substring()
in constant time
In Java 7, each String
had a backing char[]
array as well as starting position and length. This allowed the .substring()
method to run in constant time, since String
is immutable class. New String
object with same backing char[]
array was created, only with different start position and length.
You will need to extend this a bit, to support adding at the end of the string, by increasing the length. Always create a new string object, but leave the backing array same.
- Recompute hash in constant time after appending single character
Again, let me use Java's hashCode()
function for String
:
int hash = 0;
for (int i = 0; i < data.length; i++) {
hash = 31 * hash + data[i];
} // data is the backing array
Now, how will the hash change after adding a single character at the end of the word? Easy, just add it's value (ASCII code) multiplied by 31^length
. You can keep powers of 31 in some separate table, other primes can be used as well.
- Store all substring in single
HashMap
With using tricks 1 and 2, you can generate all substrings in time O(N*L^2)
, which is the total number of substrings. Just always start with string of length one and add one character at a time. Put all your strings into a single HashMap, to reduce duplicities.
(You can skip 2 and 3 and discard duplicities when/after sorting, perhaps it will be even faster.)
- Sort your substrings and you are good to go.
Well, when I got to point 4, I realized my plan wouldn't work, because in sorting you need to compare strings, and that can take O(L)
time. I came up with several attempts to solve it, among them bucket sorting, but none would be faster than original O(N*L^3)
I will just this answer here in case it inspires someone.
In case you don't know Aho-Corasic algorithm, take look into that, it could have some use for your problem.