I need to know all the positions of a word in a text - substring in string. The solution so far is to use a regex, but I am not sure if there not better, may builtin standard library strategies. Any ideas?
import re
text = "The quick brown fox jumps over the lazy dog. fox. Redfox."
links = {'fox': [], 'dog': []}
re_capture = u"(^|[^\w\-/])(%s)([^\w\-/]|$)" % "|".join(links.keys())
iterator = re.finditer(re_capture, text)
if iterator:
for match in iterator:
# fix position by context
# (' ', 'fox', ' ')
m_groups = match.groups()
start, end = match.span()
start = start + len(m_groups[0])
end = end - len(m_groups[2])
key = m_groups[1]
links[key].append((start, end))
print links
{'fox': [(16, 19), (45, 48)], 'dog': [(40, 43)]}
Edit: Partial words are not allowed to match - see fox of Redfox is not in links.
Thanks.