1

I have implemented a suffix tree, which is not compressed. I wanted to know how to solve the problem of finding the longest repreating substring in a string. I know that we have to find the deepest internal node with two children, but how can be code this. Also, how do we know what the longest repeating substring is. I am interested in the code in JAVA. Pls give java implementation. For reference, my TrieNode looks like

class TrieNode{

char ch;
LinkedList<TrieNode> child;

}
moinudin
  • 134,091
  • 45
  • 190
  • 216
TimeToCodeTheRoad
  • 7,032
  • 16
  • 57
  • 70
  • http://stackoverflow.com/a/24705760/69663 gives a way to filter out overlaps (although it also filters out some strings of all-equal letters) – unhammer Dec 26 '16 at 08:36

3 Answers3

2

The algorithm of Aho- Corasick http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm

Mihran Hovsepyan
  • 10,810
  • 14
  • 61
  • 111
1

It's only the deepest node with 2 children if you store an end of string byte.

To find the longest substring you'll need to do a depth first search, keeping a reference to the deepest node with 2 or more children and it's depth. This is easiest to do with a recursive function.

dan_waterworth
  • 6,261
  • 1
  • 30
  • 41
  • right, but this would give the answer with overlapping allowed..I mean for the string "banana", the answer would be "ana." I want with overlapping not allowed. So the answer should be "an" or "na" – TimeToCodeTheRoad Dec 18 '10 at 21:16
  • ok, I'm not sure what you mean by overlapping, perhaps using 1 letter more than once? In any case, if you don't want to include specific strings then when you do your depth first search, just exclude nodes that don't fit your criteria. – dan_waterworth Dec 19 '10 at 08:48
  • You can see the overlapping problem by typing "aaa" into https://daniel-hug.github.io/longest-repeated-substring/ – it'll claim "aa" is a repeated substring, since in the tree `[("$",Leaf),("a",[("$",Leaf),("a",[("$",Leaf),("a$",Leaf)])])]` , the deepest node with two leaf descendents has a path with the prefix "aa". – unhammer Dec 17 '16 at 10:24
0

To find deepest node, one can also do BFS and select node which has maximum level. I think node with maximum level is also the deepest node.then you can check if it has 2 children . else go higher. not sure if this will work though. any comments pps?

Programmer
  • 6,565
  • 25
  • 78
  • 125