Suggestion:
If you're dealing with multiline text (i.e. \n
), then you'll need to pass the argument: flags=re.DOTALL
to your re.findall()
method.
Case: Multiline text
# string to be searched
a = """
__Hello__ **This
is a multiline test** __it is__ **Lego
**
"""
# pattern variations
bold_pattern = r'\*\*(.*?)\*\*'
# call re functions
match = re.findall(pattern=bold_pattern, string=a)
flag_match = re.findall(pattern=bold_pattern, string=a, flags=re.DOTALL)
# print results for observation
print(match)
print(flag_match) # using the flag
Returns:
[' __it is__ ']
['This \nis a multiline test', 'Lego\n']
From the Python 3.8.2 documentation:
"The expression’s behaviour can be modified by specifying a flags value."
Dealing with (\n)
Depending on your needs, there are a few different ways you can deal with \n
. If I need to, I'll use re.sub()
on the entire text body prior to doing anything else to remove them all.
To Compile or Not to Compile?
From the Python 3.8.2 documentation:
"Some of the functions are simplified versions of the full featured methods for compiled regular expressions. Most non-trivial applications always use the compiled form...
...but using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program."
and
"The compiled versions of the most recent patterns passed to re.compile() and the module-level matching functions are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions."
So unless you're using a whole bunch of patterns, you shouldn't see a noticable improvement from compiling.
You can also use the %%time
magic command to test both options and see if you notice an advantage locally!
Good luck!