3

I know the title is a bit messy, so let me explain in detail:

I have two strings, T and P. T represents the text to be searched, and P represents the pattern to be searched for. I want to find ALL substrings of T which are within a given edit distance of P.

Example:

T = "cdddx"
P = "mdddt"

Say I wanted all substrings within edit distance 2, the answers would be:

cdddx //rewrite c and x
dddx // insert m, rewrite x
cddd //insert t, rewrite c
ddd // insert m and t

Don't know if that's all of them, but you get the point.

I know the Wagner–Fischer algorithm can be employed for solution of this problem - I check the numbers of the last row of the Wagner–Fischer matrix and see if they fulfill this condition and find the substrings that way, then run the algorithm again for T', where T' is T where the first letter has been removed, and so on. The problem is the time complexity of this shoots up to a staggering O(T^3*P). I'm looking for a solution close to the original time complexity of the Wagner-Fisher algorithm, i.e. O(T*P).

Is there a way to get this done in such time or something better than what I have right now? Note that I am not necessarily looking for a Wagner-Fischer solution, but anything is ok. Thanks!

user129186
  • 1,156
  • 2
  • 14
  • 30
  • You don't need N^3, because each run of W-F already produces answers for all prefixes of the string. Now you only need to run it for all suffixes of the source string (all prefixes of all suffixes are all substrings). That's O(N^2×M) overall complexity. You can't get much less than that because the size of the output is O(N^2) in the worst case. – n. m. could be an AI Dec 23 '15 at 17:26
  • 1
    Not sure I have the energy to write a complete answer, but here's a previous answer with the techniques that I'd turn to: http://stackoverflow.com/questions/17412543/semi-local-levenshtein-distance . This would give an algorithm that is Õ(T\*P + m) where m is the number of substrings returned (Õ hides log factors). – David Eisenstat Dec 23 '15 at 20:07
  • This is similar to the problem of [finding the edit distance to all substrings](https://stackoverflow.com/questions/8139958/algorithm-to-find-edit-distance-to-all-substrings) of a string. – Anderson Green Apr 21 '22 at 17:23

0 Answers0