Say I have the text "abcabx". I would like to know that there is a repeated pattern "ab", all the locations it appears, and how the context of those repetitions relates to its other occurrences. I also want the data structure to have the unique patterns "c" and "x" distinguished and isolated. I have setup a suffix tree in attempt to do so, and it looks like this (from this SO answer):
This does indeed tell me that the pattern "ab" appears twice, once with the suffix "cabx" and another with "x". However, the "ab" at root only points to the first occurrence of the pattern. It also has another "ab" embedded in its leaf "cabx", when I'd want that "ab" (in the "cabx") to somehow be acknowledged as a repeat in the data structure. I know that the "x" leaf of the root "ab" represents it, but I need to know, in the "cabx" leaf of "ab", that there is an "ab" in there. Plus that two unique patterns, "c" and "x", are part of that edge. Plus their locations in that edge, and a cross-reference between their "main definitions" (root edges?). It seems that such things could be figured out by iterating around the tree and putting it together, but I need a data structure that stores this information out-right.
To maybe put it simpler, the data structure needs to clearly say "here are all the unique patterns", "here are all the repeated patterns and every place they happen at", and "here is the context that relates all of these things".
So I guess I'm looking for a graph-like element to the suffix tree, something that will partition out known patterns and relate them explicitly. In the process, patterns that are unique would be noted. But I still want the contextual features of the suffix tree, such as saying both "c" (not "cabx", but "c") and "x" came after "ab", "abx" came after "abc", what came after them (in larger cases), etc. Is there an adaptation of the suffix tree that does this, or perhaps another algorithm?