I was asked to find the total number of substring (case insensitive with/without punctuations) occurrences in a given string. Some examples:
count_occurrences("Text with", "This is an example text with more than +100 lines") # Should return 1
count_occurrences("'example text'", "This is an 'example text' with more than +100 lines") # Should return 1
count_occurrences("more than", "This is an example 'text' with (more than) +100 lines") # Should return 1
count_occurrences("clock", "its 3o'clock in the morning") # Should return 0
I chose regex over .count()
as I needed an exact match, and ended up with:
def count_occurrences(word, text):
pattern = f"(?<![a-z])((?<!')|(?<='')){word}(?![a-z])((?!')|(?=''))"
return len(re.findall(pattern, text, re.IGNORECASE))
and I've got every matching count but my code took 0.10secs
while expected time is 0.025secs
. Am I missing something? is there any better (performance optimised) way to do this?