2

Recently I am learning how to use tree to solve the longest common substring problem. After learning from Wiki and other online resources, I found that we should use suffix tree to find longest common substring.

As the wiki said:

The longest common substrings of a set of strings can be found by building a generalised suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it

As Justin said:

String = ABCDE$XABCZ$
    End of word character 1 = $
    └── (0)
        ├── (20) $
        ├── (22) ABC
        │   ├── (15) DE$
        │   └── (23) Z$
        ├── (24) BC
        │   ├── (16) DE$
        │   └── (25) Z$
        ├── (26) C
        │   ├── (17) DE$
        │   └── (27) Z$
        ├── (18) DE$
        ├── (19) E$
        ├── (21) XABCZ$
        └── (28) Z$

In a (compact) suffix tree, you need to find the deepest internal node(s) which have leaf nodes from all the strings. If you have multiple nodes at the same depth, you have to compare the length of the string represented by that node. i.e. ABC, BC, and C all have the same depth, so you have to compare the length of ABC, BC, and C strings to see which is longer; which is ABC obviously.

Here I thought the process of finding deepest internal nodes which have leaf nodes from all strings is actually the process of finding longest common prefix of all suffixes from all strings.

So here's a question: Why don't we build prefix tree which store all suffixes from all strings? Then we can search prefix tree to find longest common prefix of these suffixes. I cannot tell the difference between these two. Could anyone give me some clues why we use suffix tree instead of prefix tree to solve this problem?

Community
  • 1
  • 1
JoJo
  • 1,377
  • 3
  • 14
  • 28

2 Answers2

3

Suffix tree requires only O(N) time and space for a string with length N. That's why it is possible to solve the longest common substring problem in linear time using it.
Adding all suffices of a string to a trie requires O(N^2) time and space in the worst case.

So you idea of adding all suffices of all strings to a trie is actually correct, but is inefficient compared to a solution with a suffix tree.

kraskevich
  • 18,368
  • 4
  • 33
  • 45
0

A trie is used for a dictionary. It doesn't store suffixes.

Micromega
  • 12,486
  • 7
  • 35
  • 72
  • but we could build a trie with all suffixes of all strings, right? Does that means a trie with all suffixes all all strings is just like suffix tree? – JoJo Sep 23 '14 at 18:31
  • Let me clarify:You can build a suffix tree ON TOP of a trie. It's naive but works. The ukkonen algorithm is faster. In a trie typically there isn't the suffixes:http://stackoverflow.com/questions/13893950/suffix-tree-and-tries-what-is-the-difference. – Micromega Sep 23 '14 at 19:24