1

I am learning the Ukkonen's algorithm of how to generate a suffix tree from a given string. I tried one string "dedododeodo" in the visualization website http://brenden.github.io/ukkonen-animation/, one thing I do not fully understand is: why is not there any suffix link from node number 8 to node number 3?

My understanding is: the path-label is "od", then it should link to another node "d". Here both node number 8 and node number 3 are internal nodes. I do not understand why there is no suffix link from node 8 to node 3.

I know that the suffix link is generated intra-phase instead of inter-phase. But if you go to the step 17 of 21, then you will soon see the same confusion with me. After the rule-2 extension of character '$', we should follow the suffix link of node 8, which does not exist at all (or points to root you can say).

The visualization website chose to jump to the edge after node 3.

Am I missing something here? I learned one post from this link Ukkonen's suffix tree algorithm in plain English that sometimes suffix link does not exist, so we have to rescan from the root of the tree.

If what the author said is true, then how does it affect the runtime complexity? I mean, the whole idea of Ukkonen algorithm is based on several important tricks, and one of them is suffix link, right?

I have read some article about the linear time complexity proof. The suffix link is very important to guarantee a linear complexity. With suffix link, the node-depth drops by 2 at most, and the tree depth is at most n. With these two constraints, we can say the number of "down-walk" in the suffix tree is capped. With rescanning, the linear time complexity will no longer hold true?

Please help. Thanks. My implementation C++ source code is at here: https://github.com/hdc1112/SuffixTree

Dachuan Huang
  • 115
  • 1
  • 8
  • Huh, that’s odd. I also would have expected to see a suffix link there. – templatetypedef Jul 19 '21 at 19:09
  • I just read the Gusfield G book again, and realized that the suffix link not only gets created from a previous created internal node to a newly created internal node, but also gets created from a newly created internal node to a found internal node (which already exists). So the "rescanning" argument, in my opinion, is groundless. The claim "every internal node" has a suffix link still holds. Feel free to correct me. I fixed the bug, and the link's source code should reflect a correct implementation of Ukkonen's algorithm (if without other bugs..) – Dachuan Huang Jul 20 '21 at 05:15

1 Answers1

0

enter image description here

I just read the Gusfield G book again, and took a screenshot of one important theorem. The theorem told us that the suffix link can have two cases: 1st, it's from a previously created internal node to a newly created internal node; 2nd, it's from an internal node to another already existing internal node.

The theorem told us that the suffix link ending node can either be created or found.

Dachuan Huang
  • 115
  • 1
  • 8