2

I have two strings:

str1 = "123147"
str2 =    "1474671231"

the end of str1 has some similar part with the start of str2 ("147"), and I want to find the length of this similar part, so I tried to:

for ch in str1:
    if ch == str2[0]:
        start_idx = len(str1) - date.index(ch)
        break

However the problem is it will return a mistake if the begin of str1 is same as the begin of str2 ("1") and if I reverse the checking order, it still have this problem ("7"). Is there any simple method to solve it?

Most important, I only want to check the end of str1 and the beginning of str2, and ignore other parts, for example, "1231" in str1 and str2 should be ignored.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
4daJKong
  • 1,825
  • 9
  • 21
  • 1
    Does this answer your question? [Find common substring between two strings](https://stackoverflow.com/questions/18715688/find-common-substring-between-two-strings) – mkrieger1 Mar 29 '22 at 08:25
  • Naively: run through the start index of the first string, from 0 to len(str1), take the substring str[index:] and check if the str2 starts with that substring. Does not start with 123147, not with 23147, not with 3147 but then you check 147 and it matches. – luk2302 Mar 29 '22 at 08:25
  • @mkrieger1 it didn't solve my problem because if the end of str2 have some same part as the end of str1, it will give a false answer – 4daJKong Mar 29 '22 at 08:32
  • 1
    Using `SequenceMatcher` works perfectly for your example. Did you try that? – mkrieger1 Mar 29 '22 at 08:38
  • @mkrieger1 I have tried this method in difflib, but still have some problem, please have a look on the most important part, thanks again:) – 4daJKong Mar 29 '22 at 08:40
  • Do you mean that in your updated example the result should still be "147" and not "1231"? – mkrieger1 Mar 29 '22 at 08:42
  • @mkrieger1 yes! – 4daJKong Mar 29 '22 at 08:51

2 Answers2

2

It does not seem a problem to just try all suffixes of str1, starting with the longest:

def intersection_length(str1, str2):
    for i in range(max(0, len(str2) - len(str1), len(str1))):
        if str2.startswith(str1[i:]):
            return len(str1) - i
    return 0

Run as:

str1 = "12314755"
str2 = "14755467"
print(intersection_length(str1, str2))  # 5
trincot
  • 317,000
  • 35
  • 244
  • 286
0

A possible approach is to use a regex, and concat strings in a smart way:

import re

C = '#'
result = len(re.findall(r'^(\w*).*\1$', str2 + C + str1)[0])

The assumption here is that both str1 and str2 do not contain character C.

Another solution (which has optimal complexity, i.e. it's very fast compared to other approaches when you're dealing with long strings, but is clearly more complicated than what you need):

def longest_prefix_suffix(str1, str2):
    n = min(len(str1), len(str2))
    lps = [0] * n
    l = 0
    i = 1
    while i < n:
        if str1[i] == str2[l]:
            l += 1
            lps[i] = l
            i += 1
        else:
            if l != 0:
                l = lps[l - 1]
            else:
                lps[i] = 0
                i = i + 1
    return lps[n - 1]
Riccardo Bucco
  • 13,980
  • 4
  • 22
  • 50