I'm trying to find words which contains substring (as input) in huge text. The text looks like this: *america*python*erica*escape*.. Example: Input: "rica" => Output: america,erica
I use suffix array.
My pseudocode (pythonlike) is:
firstChar=input[0] // the first character of input
suffixArray=getSuffixArray(text) // suffix array
result=[]
for every index of suffix array which points to firstChar:
length=len(input)
indexText=text[suffixArray[index]]
indexes=[]
if input in text[indexText: indexText+length]:
word=find whole word containig this index between '*'
result.append(word)
This works, but it is too slow. LCP array should improve a runtime of algorhitm but I can't figure out how. Will you give me an advice?
Thanks in advance!