Concatenate two strings with a common substring?

Question

Say I have strings,

string1 = 'Hello how are you'
string2 = 'are you doing now?'

The result should be something like

Hello how are you doing now?

I was thinking different ways using re and string search. (Longest common substring problem)

But is there any simple way (or library) that does this in python?

To make things clear i'll add one more set of test strings!

string1 = 'This is a nice ACADEMY'
string2 = 'DEMY you know!'

the result would be!,

'This is a nice ACADEMY you know!'

what should be the result if `string1 = 'Hello how are you now?'` ? (`now?` added) — RomanPerekhrest, Oct 21 '17 at 07:12
the result can be `'Hello how are you now are you doing now'`. Although a string like this won't come most likely! — void, Oct 21 '17 at 07:18
Although if the result is `'Hello how are you doing now'` would be GREAT even with `now?` added — void, Oct 21 '17 at 07:20

Ashish Ranjan · Accepted Answer · 2017-10-21T07:38:34.353

This should do:

string1 = 'Hello how are you'
string2 = 'are you doing now?'
i = 0
while not string2.startswith(string1[i:]):
    i += 1

sFinal = string1[:i] + string2

OUTPUT :

>>> sFinal
'Hello how are you doing now?'

or, make it a function so that you can use it again without rewriting:

def merge(s1, s2):
    i = 0
    while not s2.startswith(s1[i:]):
        i += 1
    return s1[:i] + s2

OUTPUT :

>>> merge('Hello how are you', 'are you doing now?')
'Hello how are you doing now?'
>>> merge("This is a nice ACADEMY", "DEMY you know!")
'This is a nice ACADEMY you know!'

score 2 · Answer 2 · answered Oct 21 '17 at 07:29

This should do what you want:

def overlap_concat(s1, s2):
    l = min(len(s1), len(s2))
    for i in range(l, 0, -1):
        if s1.endswith(s2[:i]):
            return s1 + s2[i:]
    return s1 + s2

Examples:

>>> overlap_concat("Hello how are you", "are you doing now?")
'Hello how are you doing now?'
>>> 

>>> overlap_concat("This is a nice ACADEMY", "DEMY you know!")
'This is a nice ACADEMY you know!'
>>>

cs95 · Answer 3 · 2017-10-21T15:08:07.430

1

Using str.endswith and enumerate:

def overlap(string1, string2):
    for i, s in enumerate(string2, 1):
         if string1.endswith(string2[:i]):
            break

    return string1 + string2[i:]

>>> overlap("Hello how are you", "are you doing now?")
'Hello how are you doing now?'

>>> overlap("This is a nice ACADEMY", "DEMY you know!")
'This is a nice ACADEMY you know!'

If you were to account for trailing special characters, you'd be wanting to employ some re based substitution.

import re
string1 = re.sub('[^\w\s]', '', string1)

Although note that this would remove all special characters in the first string.

A modification to the above function which will find the longest matching substring (instead of the shortest) involves traversing string2 in reverse.

def overlap(string1, string2):
   for i in range(len(s)):
      if string1.endswith(string2[:len(string2) - i]):
          break

   return string1 + string2[len(string2) - i:]

>>> overlap('Where did', 'did you go?') 
'Where did you go?'

edited Oct 21 '17 at 15:08

answered Oct 21 '17 at 07:32

cs95

379,657
97
704
746

@TomKarzes Saying "it doesn't work" is a bit much, seeing as this can just be fixed by reversing the string before iteration. I'll leave that up to OP, because they have never specified anything of the sort (actually, OP doesn't really know what they want). – cs95 Oct 21 '17 at 14:49
This doesn't work. It finds the smallest non-empty overlap, rather than the largest. For example, for overlap('Where did', 'did you go?') it gives 'Where didid you go?', rather than the desired 'Where did you go?'. It needs to start with the longest possible overlap, not the smallest. – Tom Karzes Oct 21 '17 at 14:56
@TomKarzes Yes, I saw your comment previously, see my reply above. – cs95 Oct 21 '17 at 14:56
Sorry, I deleted my original comment and provided a much more clear-cut example of how this version fails. If the last character of the first string is the same as first first character of the second string, it will only find a one-character overlap. I think it's clear that isn't what;s desired. – Tom Karzes Oct 21 '17 at 14:57
@TomKarzes Really that's a special case, and like I mentioned, the fix is simple. Saying "it doesn't work" really is a little unfair. – cs95 Oct 21 '17 at 14:58
Well, I disagree. As a general rule, any algorithm that is looking for overlap between two sequences should always return the *largest* overlap, not the smallest. I think most people will agree that "did" is the desired overlap in my example, not "d". – Tom Karzes Oct 21 '17 at 15:00
@TomKarzes Hyper nitpick accepted, see edit. Like I said multiple times, the fix was simple. – cs95 Oct 21 '17 at 15:08

score 1 · Answer 4 · answered Oct 21 '17 at 07:48

1

Other answers were great guys but it did fail for this input.

string1 = 'THE ACADEMY has'
string2= '.CADEMY has taken'

output:

>>> merge(string1,string2)
'THE ACADEMY has.CADEMY has taken'
>>> overlap(string1,string2)
'THE ACADEMY has'

However there's this standard library difflib which proved to be effective in my case!

match = SequenceMatcher(None, string1,\
                        string2).find_longest_match\
                        (0, len(string1), 0, len(string2))

print(match)  # -> Match(a=0, b=15, size=9)
print(string1[: match.a + match.size]+string2[match.b + match.size:])

output:

Match(a=5, b=1, size=10)
THE ACADEMY has taken

answered Oct 21 '17 at 07:48

void

2,571
2
20
35

2

your matching rules are somehow arbitrary. `ACADEMY` overlaps only `CADEMY` on `.CADEMY` string so the dot `.` should remain. I'm sure there could be cases when your `SequenceMatcher` will also fail – RomanPerekhrest Oct 21 '17 at 07:52
1

Here is one fail case using your approach: `import difflib string1 = 'This is a nice ACADEMY' string2 = 'DEMY you know! nice' match = difflib.SequenceMatcher(None, string1, string2).find_longest_match(0, len(string1), 0, len(string2)) print(string1[: match.a + match.size]+string2[match.b + match.size:])` . The output will be: `This is a nice` .Is it correct? - NOPE – RomanPerekhrest Oct 21 '17 at 07:57
1

voting to close the question as too broad – RomanPerekhrest Oct 21 '17 at 07:58
It would be nice if you'd specify such requirements in your _question_. – cs95 Oct 21 '17 at 07:59
1

@RomanPerekhrest With you on that. Also, vishnu, please do not use version specific tags if your question does not pertain to any particular version - that tantamounts to tag spamming, so please do not do it again. I removed them for a reason. – cs95 Oct 21 '17 at 08:00

score 0 · Answer 5 · answered Oct 21 '17 at 10:02

which words you want to replace are appearing in the second string so you can try something like :

new_string=[string2.split()]
new=[]
new1=[j for item in new_string for j in item if j not in string1]
new1.insert(0,string1)
print(" ".join(new1))

with the first test case:

string1 = 'Hello how are you'
string2 = 'are you doing now?'

output:

Hello how are you doing now?

second test case:

string1 = 'This is a nice ACADEMY'
string2 = 'DEMY you know!'

output:

This is a nice ACADEMY you know!

Explanation :

first, we are splitting the second string so we can find which words we have to remove or replace :

new_string=[string2.split()]

second step we will check each word of this splitter string with string1 , if any word is in that string than choose only first string word , leave that word in second string :

new1=[j for item in new_string for j in item if j not in string1]

This list comprehension is same as :

new1=[]
for item in new_string:
    for j in item:
        if j not in string1:
            new1.append(j)

last step combines both string and join the list:

new1.insert(0,string1)
print(" ".join(new1))

Concatenate two strings with a common substring?

5 Answers5

Linked