Sounds like you basically want the beginning and end of the match to be either the end of the paragraph, or a transition to a space character (the end of a "word", though sadly, the regex definition of word excludes stuff like .
, so you can't use tests based on \b
).
The simplest approach here is to just split the line by whitespace, and see if the string you have occurs in the resulting list
(using some variation on finding a sublist in a list
):
def list_contains_sublist(haystack, needle):
firstn, *restn = needle # Extracted up front for efficiency
for i, x in enumerate(haystack, 1):
if x == firstn and haystack[i:i+len(restn)] == restn:
return True
return False
para_words = paragraph.split()
def checkIfProdExist(x):
return list_contains_sublist(para_words, x.split())
If you want the index too, or need precise whitespace matching, it's trickier (.split()
won't preserve runs of whitespace so you can't reconstruct the index, and you might get the wrong index if you index the whole string and the substring occurs twice, but only the second one meets your requirements). At that point, I'd probably just go with a regex:
import re
def checkIfProdExist(x):
m = re.search(fr'(^|\s){re.escape(x)}(?=\s|$)', paragraph)
if m:
return m.end(1) # After the matched space, if any
return -1 # Or omit return for implicit None, or raise an exception, or whatever
Note that as written, this won't work with your filter
(if the paragraph begins with the substring, it returns 0
, which is falsy). You might have it return None
on failure and a tuple
of the indices on success so it works both for boolean and index-demanding cases, e.g. (demonstrating walrus use for 3.8+ for fun):
def checkIfProdExist(x):
if m := re.search(fr'(?:^|\s)({re.escape(x)})(?=\s|$)', paragraph):
return m.span(1) # We're capturing match directly to get end of match easily, so we stop capturing leading space and just use span of capture
# Implicitly returns falsy None on failure