1

Are there an efficient algorithm to search and dump all common substrings (which length is 3 or longer) between 2 strings?

Example input:

Length   : 0    5    10   15   20   25   30

String 1 : ABC-DEF-GHI-JKL-ABC-ABC-STU-MWX-Y
String 2 : ABC-JKL-MNO-ABC-DEF-PQR-DEF-ZWX-Y

Example output:

In string 1           2
---------------------------
ABC-DEF   0           12
ABC-DE    0           12
BC-DEF    1           13
:
-ABC-     15,19       11
-JKL-     11          3
-DEF-     3           15
-JKL      11          3
JKL-      12          4
-DEF      3           15,23
DEF-      4           16
WX-Y      29          29
ABC-      0,16,20     0,12
-ABC      15,19       11
DEF-      4           16,24
DEF       4           16,24
ABC       0,16,20     0,12
JKL       12          4
WX-       29          29
X-Y       30          30
-AB       15,19       11
BC-       1,17,21     1,13
-DE       3           15,23
EF-       5           17,25
-JK       11          3
KL-       13          5
:

In the example, "-D", "-M" is also a common substring but is not required, because it's length is only 2. (There might be some missing outputs in example because there are so many of them...)

Izumi Kawashima
  • 1,197
  • 11
  • 25
  • possible duplicate of [Fast algorithm for searching for substrings in a string](http://stackoverflow.com/questions/1765579/fast-algorithm-for-searching-for-substrings-in-a-string) – assylias Apr 11 '12 at 10:17
  • The Aho-Corasick algorithm helped me a lot. Thank you! – Izumi Kawashima Apr 15 '12 at 15:44

1 Answers1

0

You can find all common substrings using a data structure called a Generalized suffix tree

Libstree contains some example code for finding the longest common substring. That example code can be modified to obtain all common substrings.

Jason
  • 2,341
  • 17
  • 14