To match a newline, or "any symbol" without re.S
/re.DOTALL
, you may use any of the following:
(?s).
- the inline modifier group with s
flag on sets a scope where all .
patterns match any char including line break chars
Any of the following work-arounds:
[\s\S]
[\w\W]
[\d\D]
The main idea is that the opposite shorthand classes inside a character class match any symbol there is in the input string.
Comparing it to (.|\s)
and other variations with alternation, the character class solution is much more efficient as it involves much less backtracking (when used with a *
or +
quantifier). Compare the small example: it takes (?:.|\n)+
45 steps to complete, and it takes [\s\S]+
just 2 steps.
See a Python demo where I am matching a line starting with 123
and up to the first occurrence of 3
at the start of a line and including the rest of that line:
import re
text = """abc
123
def
356
more text..."""
print( re.findall(r"^123(?s:.*?)^3.*", text, re.M) )
# => ['123\ndef\n356']
print( re.findall(r"^123[\w\W]*?^3.*", text, re.M) )
# => ['123\ndef\n356']