1

Given 2 texts where one was created from the other by repeated applications of "Find&Replace All", is there an efficient algorithm to find what the find/replace parameters were?

Example1

Before: Bob and Tom and Harry

After Find & Replace: Bob & Tom & Harry

Desired Output: and => &

Example2

Before: Bob and Tom and Harry hold hands

After Find & Replace: Bxxb & Txxm & Harry hxxld hands

Desired Output: _and => _&, o => xx (_ represents a space - neccessary to not match 'hands').

To brute force, you could try applying Find & Replace to the 'before' string with "find" parameter of all substrings of the before-string, and "replace" parameter of all substrings of the after-string, and see if any of the resulting strings equalled the 'after' string. But this would take a long time (exponential?) for anything but very short strings. Is there a commonly used algorithm or heuristic that performs in better time?

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
Crashthatch
  • 1,283
  • 2
  • 13
  • 20
  • you said substring, does it mean that I can split them between spaces to capture complete words? I want to know, how complex woulld be the case? – Harpreet Singh Jul 11 '14 at 17:29
  • 1
    Interesting problem. I don't have an answer for you, but if it were me, I'd start with an implementation similar to diff, and for the first difference found using that standard method, use it as a find/replace call, and see if this results in fewer or more diffs. If it results in fewer diffs, proceed using the new replaced string as your target. If it results in more, backtrack. That won't get an optimal answer, but it will get an answer (even if that answer is just normal diff output). – Welbog Jul 11 '14 at 17:30
  • @Harpreet I'm not sure what you mean, but spaces have no special behaviour. A possible find/replace could be `"Bob" => "Robert"`, another could be `"ob an" => "12 3"`. – Crashthatch Jul 11 '14 at 17:30
  • 1
    May be helpful: http://en.wikipedia.org/wiki/Diff, http://en.wikipedia.org/wiki/Longest_common_subsequence_problem – Welbog Jul 11 '14 at 17:31
  • 1
    possible duplicate of [Diff Algorithm](http://stackoverflow.com/questions/805626/diff-algorithm) – Harpreet Singh Jul 11 '14 at 17:43
  • I think this differs from the Diff algorithm because diff does something more powerful than Find&Replace- it can make additions without deletions and vice-versa at arbitrary places in the string (identified by character index) and is not bound to replace ALL of one string with another. They are definitely related though, and if I have to write / invent my own it will undoubtedly be based on the Diff algorithm. – Crashthatch Jul 11 '14 at 18:01
  • Not sure I get it right, but.. applying a diff - provided you have both original and result, is certainly a good first step - then you know what was replaced by what, and figuring out the find&replace parameters would then be trivial. – MightyPork Jul 13 '14 at 20:37
  • @MightyPork: The standard diff algorithm will also produce additions (ie. "" => "something") which isn't possible in Find&Replace. It also might be the case that not ALL of the "from" strings produced by a diff were replaced in the "to" string (see 'and', part of 'hands' in example 2 in original post). I agree that a diff is a good starting point, but I don't believe going from a diff to find&replace parameters is trivial. – Crashthatch Jul 13 '14 at 23:56
  • 3
    Since there is a trivial answer - a single replace operation taking the first string to the second - you will have to define an optimal or optimum solution (as diff does by assigning scores to insert, delete, and replace operations). – Gene Jul 14 '14 at 01:02

0 Answers0