matching any character including newlines in a Python regex subexpression, not globally

Question

I want to use re.MULTILINE but NOT re.DOTALL, so that I can have a regex that includes both an "any character" wildcard and the normal . wildcard that doesn't match newlines.

Is there a way to do this? What should I use to match any character in those instances that I want to include newlines?

Hi Jason, unless I'm missing something, "python" + "regex" can be implied from the tags, so does not need to be specified in the title (per the "no tags in titles" guideline)? — Matt, Nov 13 '15 at 12:33
because SO's list of related questions DOES NOT INCLUDE THE TAGS so context information is important. — Jason S, Nov 13 '15 at 13:26
The "no tags in titles" either needs to be a guideline (not a requirement) or needs to be revisited, or SO needs to start showing tags in the list of related questions. — Jason S, Nov 13 '15 at 13:29

score 141 · Accepted Answer · edited Mar 14 '22 at 18:16

141

To match a newline, or "any symbol" without re.S/re.DOTALL, you may use any of the following:

(?s). - the inline modifier group with s flag on sets a scope where all . patterns match any char including line break chars
Any of the following work-arounds:

[\s\S]
[\w\W]
[\d\D]

The main idea is that the opposite shorthand classes inside a character class match any symbol there is in the input string.

Comparing it to (.|\s) and other variations with alternation, the character class solution is much more efficient as it involves much less backtracking (when used with a * or + quantifier). Compare the small example: it takes (?:.|\n)+ 45 steps to complete, and it takes [\s\S]+ just 2 steps.

See a Python demo where I am matching a line starting with 123 and up to the first occurrence of 3 at the start of a line and including the rest of that line:

import re
text = """abc
123
def
356
more text..."""
print( re.findall(r"^123(?s:.*?)^3.*", text, re.M) )
# => ['123\ndef\n356']
print( re.findall(r"^123[\w\W]*?^3.*", text, re.M) )
# => ['123\ndef\n356']

edited Mar 14 '22 at 18:16

Iulian Onofrei

9,188
10
67
113

answered Oct 23 '15 at 22:16

Wiktor Stribiżew

607,720
39
448
563

1

awesome, thanks! I knew there was a way to do it but couldn't remember. – Jason S Oct 23 '15 at 22:40
Fixed some typos. Sorry for them. – Wiktor Stribiżew Apr 15 '17 at 07:15
1

@IoannisFilippidis You are suggesting using a regex option to match any char. This is out of the current post scope as OP know about the regex options, both `re.M` and `re.S`/`re.DOTALL`, but wants to know how to do it without the flags. Besides, `re.MULTILINE` is a wrong flag to match any char in Python `re` since it only modifies the behavior of `^` and `$` anchors, while `re.S` or `re.DOTALL` make `.` match any char including a newline. – Wiktor Stribiżew Sep 04 '18 at 07:01
1

@WiktorStribiżew a link to this answer on your profile for the text "NEVER USE `(.|\n)`!!!" would be useful for regex amateurs like myself. – pault Aug 05 '19 at 19:36

Ali Sajjad · Answer 2 · 2020-11-30T11:12:16.637

11

Match any character (including new line):

Regular Expression: (Note the use of space ' ' is also there)

[\S\n\t\v ]

Example:

import re

text = 'abc def ###A quick brown fox.\nIt jumps over the lazy dog### ghi jkl'
# We want to extract "A quick brown fox.\nIt jumps over the lazy dog"
matches = re.findall('###[\S\n ]+###', text)
print(matches[0])

The 'matches[0]' will contain:
'A quick brown fox.\nIt jumps over the lazy dog'

Description of '\S' Python docs:

\S Matches any character which is not a whitespace character.

( See: https://docs.python.org/3/library/re.html#regular-expression-syntax )

edited Nov 30 '20 at 11:12

answered Jul 13 '20 at 15:54

Ali Sajjad

3,589
1
28
38

This does not match `\t` or `\v`. – ApproachingDarknessFish Nov 29 '20 at 10:33
The `\v` is not ocasionally used but I included it anyway. And the question says to match "any character including newline". So whatever works for him :-) @ApproachingDarknessFish – Ali Sajjad Nov 30 '20 at 11:14

matching any character including newlines in a Python regex subexpression, not globally

2 Answers2

Match any character (including new line):

Example:

Description of '\S' Python docs:

Linked

Related