How and when to create a suffix link in suffix tree?

Question

Could anyone give me an example about how and when to create a suffix link in suffix tree?

If my string is ABABABC, but do use a different example if that is better.

Hope to give some pictures to illustrate every step.

very appreciate.

score 68 · Answer 1 · edited May 23 '17 at 12:18

To understand this, first recall that there are three kinds of nodes in a suffix tree:

The root
Internal nodes
Leaf nodes

In the graph below, which is the suffix tree for ABABABC, the yellow circle is the root, the grey, blue and green ones are internal nodes, and the small black ones are leaves.

There are two important things to notice:

Internal nodes always have more than 1 outgoing edge. That is, internal nodes mark those parts of the tree where branching occurs.
Branching occurs wherever a repeated string is involved, and only there. For any internal node X, the string leading from the root to X must have occurred in the input string at least as many times as there are outgoing edges from X.

Example: The string leading to the blue node is ABAB. Indeed, this string appears twice in the input string: At position 0 and at position 2. And that is why the blue node exists.

Now about suffix links:

If the string s leading up to some internal node X is longer than 1 character, the same string minus the first character (call this s_-1) must be in the tree, too (it's a suffix tree, after all, so the suffix of any of its strings must be in the tree, too).

Example: Let s=ABAB, the string leading to the blue node. Then after removing the first character, s_-1 is BAB. And indeed that string is found in the tree, too. It leads to the green node.
If some string s leads to an internal node, its shortened version s_-1 must lead to an internal node (call it X_-1) as well. Why? Because s must appear at least twice in the input string, so s_-1 must appear at least as many times (because it is part of s: wherever s appears, s_-1 must appear, too). But if s_-1 appears multiple times in the input string, then there must be an internal node for it.
In any such situation, a special link connecting X to X_-1 is a suffix link.

Note: Because of (1) and (2) above, every internal node X that has a label from root to X of more than 1 character must have a suffix link to exactly one other internal node.

Example:

This is the same suffix tree as before; the dotted lines indicate the suffix links. If you start at the blue node and follow the suffix links from there (from blue, to green, to first gray, to second gray), and look at the strings leading from the root to each node, you will see this:

 ABAB   ->    BAB    ->    AB    ->    B
(blue)      (green)     (gray1)     (gray2)

This is why they are called suffix links (the entire sequence is called suffix chain).

What are they good for?

They are good for surprisingly many things. However, they play a particular role in Ukkonen's algorithm for suffix tree construction, specifically in Rule 3 described there: After inserting a the final character of some suffix s at some point X, the algorithm needs to insert the final character of suffix s_-1 in O(1) time. In order to do that, it uses the suffix link to jump right to the place X_-1 and makes the insert.

But, note that there is no necessity to put suffix links in a suffix tree. They are not part of the definition of a suffix tree — they are just special links used by some algorithms that construct or use suffix trees.

Regarding single-character nodes: What if there is an internal node X whose string (i.e. the string on the path from root to X) consists of only one character? By the definition above, X then does not have a suffix link. You can however assume that if it had a suffix link, it would point to the root node. Furthermore: If, by the definition above, an internal node does not have a suffix link, it must be a single-character node, so you can always assume that if no suffix link is present at an internal node it must be a single-character node, and therefore, the node that represents the s_-1 suffix is the root node. (Some algorithms may actually add an explicit suffix link pointing to the root node in this case.) Thanks to j_random_hacker for the comment about this.

Thanks for your quick answer. I understand the concept of suffix tree. Hope to get a concrete implementation of it. — lingguang1997, Apr 18 '12 at 17:50
"For any internal node X, the string leading from the root to X must have occurred in the input string as many times as there are outgoing edges from X." The substring AB occurs 3 times in ABABABC, but there are only 2 outgoing edges from the grey dot whose path is labelled AB. — librik, May 19 '12 at 03:12
@librik Thank you. I've inserted the words 'at least'. I believe that makes it a correct (and still useful) statement. — jogojapan, May 22 '12 at 00:48
Fantastically clear and gentle introduction. I would add only that suffix links also exist from nodes labelled with a single character, back to the root. — j_random_hacker, Aug 08 '13 at 16:38
@j_random_hacker Yes, that's right. The reason I didn't include them in the description is that they need not be added to the tree explicitly. If suffix links are added to the tree systematically during tree construction (as is the case in Ukkonen's algorithm), you can simply assume that _any_ internal node that does not have an outgoing suffix link, is a single-character node and its suffix link therefore must lead to the root node. But it's true that at least _conceptually_ these suffix links still exist. — jogojapan, Aug 10 '13 at 01:27
Thanks for this - it's a great answer about *What* are suffix links. However, the OP is asking *How* and *When* do we create suffix links (I believe that he's referring to the context of Ukkonen's algorithm) - and you haven't answered that... — ihadanny, Nov 12 '16 at 08:23
@ihadanny Have you looked at the "What are they good for?" section? — jogojapan, Nov 12 '16 at 09:14
Hey @jogojapan - first, I'd like to thank you personally for this answer and the great answer about Ukkonen in plain English. However, I still think that the "What are they good for" section describes **how are they used, assuming they already exist**. The OP would like to know is **when do we create a suffix link**. I guess this is trivial for you that once adding an internal node you immediately can add a suffix link to the previously created internal node, and that covers all the suffix links we need to create - but it's not really intuitive for me... — ihadanny, Nov 12 '16 at 09:39
@jogojapan Can you elaborate on why one cannot simply replace the logic of suffix links with keeping pointers to all nodes in the form of a hash table? e.g. when done with node: ABC look for $p[BC]$ where $p$ is the hash table with key=suffix and value=pointer to node for that suffix. — Anon, Mar 13 '23 at 12:31
@Anon You can do that, but don't forget that for each hash lookup a hash value has to be computed for the string you are looking up. If that string is N characters in length, computing the hash function will be O(N) time. (Hashes are often considered O(1) lookup, but that is because in discussions about hash complexity, the time for computing the hash value is usually ignored because the point in those discussions is about the retrieval time compared to, say, trees, and then the length of the key is negligible.) — jogojapan, Mar 16 '23 at 21:34

How and when to create a suffix link in suffix tree?

1 Answers1