I am trying to use ukkonen's suffix tree to compare documents.
At this point I'm concerning about two things:
First I'm trying to generate the suffix tree for one document and then use that suffix tree to find all common substrings within that document.
Next is identifying all the common substrings between two documents.
I was able to generate ukkonen suffix tree for a document based on http://marknelson.us/1996/08/01/suffix-trees/ . And search for a given substring. But still I could not find a way to identify all the common substrings within the given document. Could you please tell me a way to do this.I'm using visual c++.
Can we use ukkonen's algorithm to compare two documetns and identify all the common substrings between them? If so please give step by step explanation.
There is a good explanation on Ukkonen's suffix tree in Ukkonen's suffix tree algorithm in plain English?