1
val = "one two three four five" 
string1 = "you id is one two three" 
string2 = "continue to four five"

Expect output: Start span and end span
output1 = 10,22 
output2 = 12,20

Here some contain of val is present in string1 and string2.We need to detect spans

2 Answers2

2

Form a regex alternation of the number keywords, and then iterate to find all matches with their indices:

val = "one two three four five"
string1 = "you id is one two three"
regex = r'\b(?:' + '|'.join(val.split()) + r')\b'
p = re.compile(regex + r'(?: ' + regex + r')*')

for m in p.finditer(string1):
    print(m.start(), m.end(), m.group())  # (10, 23, 'one two three')

To be clear here, the regex used is this:

\b(?:one|two|three|four|five)\b(?: \b(?:one|two|three|four|five)\b)*
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

Instead of using re, you can just easily use difflib for this purpose since it already provides the exact functionality that you need which is find_longest_match().

import difflib

val = "one two three four five" 

for my_string in [
    "your id is one two three",
    "continue to four five six",
]:
    sequence_matcher = difflib.SequenceMatcher(a=val, b=my_string)
    match = sequence_matcher.find_longest_match(0, len(val), 0, len(my_string))
    match_str = my_string[match.b:match.b + match.size]
    print(match.b, match.b + match.size, match_str)

Output:

10 23 one two three
11 21  four five