As for timing issues, make yourself comfortable with the timeit
module.
For your specific problem, you have two choices, the in
operator and a more accurate regex approach:
import re, timeit
content = """
fisrt line
second line
some gossip is innate
smush smush
squish bust
although
last line
"""
def only_string_functions():
return [line for line in content.split("\n") if "gossip" in line]
pattern = re.compile(r'\bgossip\b')
def regex_approach():
return [line for line in content.split("\n") if pattern.search(line)]
print(timeit.timeit(only_string_functions, number=10**5))
print(timeit.timeit(regex_approach, number=10**5))
Running this a 100.000 times, it yields on my MacBook
:
0.11374067
0.40804803300000003
So, as expected, the in
operator is by far faster (about three times) than the regex approach but will give you lines like mygossip should not be matched
as well - this may or may not be a problem.