0

Have tried searching for an answer but not finding any matching use-case.

The goal: Merge two strings, replacing the duplicate phrases/content.

Example..

String first =  "The quick brown fox jumped.";
String second = "The quick brown fox jumped, and was happy.";
// magic goes here
String intended_outcome = "The quick brown fox jumped, and was happy";

So in this case it's simple, could just do a String.IndexOf nocase ordinal.

But then consider:

String first =  "Hello there. The quick brown fox jumped.";
String second = "The quick brown fox jumped, and was happy.";

// remove all punctuation, words by spaces only(to simplify)
first = Regex.Replace(first,@"[^\w\s]",""); 
second = Regex.Replace(second,@"[^\w\s]","");
// magic goes here

String intended_outcome = "Hello there, The quick brown fox jumped, and was happy";
  • Would you recommend that I convert to an array(first+second combined, split by space), and then process recursively to find duplication of phrases by stepping through each word one by one?
  • Or loop each word, use an indexOf check to find a common word between the two (eg. second.indexOf(av_split_first[i])!=-1), as a start-index, and process subsequent words until all are found, replace, and recurse w/ offset?

Has anyone tackled this before? I'm targeting C#, but would be interested in any solution for any language, regex or otherwise

Barry
  • 362
  • 3
  • 14
  • 2
    What is the logic behind the comma showing up after 'hello there' in the outcome? – Bas Feb 21 '18 at 16:55
  • Just to indicate that we can't necessarily determine punctuation (to simplify the processing loop). So if we preprocess both strings, remove all punctuation, then it's just a series of words separated by space >> so the intended_outcome would just combine the phrases detected, separated them by a static comma. I'm actually working with fairly complex strings. So am comfortable discarding original punctuation in order to achieve the goal – Barry Feb 21 '18 at 16:56
  • 1
    `IndexOf` and regexes can't do this; you're looking for a [text merge](https://stackoverflow.com/questions/138331/). – Dour High Arch Feb 21 '18 at 17:01
  • whoa nice! @DourHighArch That is very likely the solution. I wasn't searching with the right keywords.. thank you! Even has a C# port. – Barry Feb 21 '18 at 17:03
  • Actually per testing, that doesn't fix the problem. It creates a diff and patch, which produces a lossy result (it strips away items not found in the second string). Therefore I'm having to write my own approach to solve this using my bullet points above. I'll leave the question as answered (marked as duplicate), even though it really isn't answered correctly. Creating a diff of two strings does NOT lead towards a merge of the two strings contents, it correlates to a differential resultant, which may not (does not) include contents of both. But thanks anyway for the link. – Barry Feb 21 '18 at 19:58

0 Answers0