Matching newline and any character with Python regex

Question

I have a text like

var12.1
a
a
dsa

88
123!!!
secondVar12.1

The string between var and secondVar may be different (and there may be different count of them).

How can I dump it with regexp?
I'm trying something something like this to no avail:

re.findall(r"^var[0-9]+\.[0-9]+[\n.]+^secondVar[0-9]+\.[0-9]+", str, re.MULTILINE)

Wiktor Stribiżew · Accepted Answer · 2019-10-23T07:50:25.363

You can grab it with:

var\d+(?:(?!var\d).)*?secondVar

See demo. re.S (or re.DOTALL) modifier must be used with this regex so that . could match a newline. The text between the delimiters will be in Group 1.

NOTE: The closest match will be matched due to (?:(?!var\d).)*? tempered greedy token (i.e. if you have another var + a digit after var + 1+ digits then the match will be between the second var and secondVar.

NOTE2: You might want to use \b word boundaries to match the words beginning with them: \bvar(?:(?!var\d).)*?\bsecondVar.

REGEX EXPLANATION

var - match the starting delimiter
\d+ - 1+ digits
(?:(?!var\d).)*? - a tempered greedy token that matches any char, 0 or more (but as few as possible) repetitions, that does not start a char sequence var and a digit
secondVar - match secondVar literally.

IDEONE DEMO

import re
p = re.compile(r'var\d+(?:(?!var\d).)*?secondVar', re.DOTALL)
test_str = "var12.1\na\na\ndsa\n\n88\n123!!!\nsecondVar12.1\nvar12.1\na\na\ndsa\n\n88\n123!!!\nsecondVar12.1"
print(p.findall(test_str))

Result for the input string (I doubled it for demo purposes):

['12.1\na\na\ndsa\n\n88\n123!!!\n', '12.1\na\na\ndsa\n\n88\n123!!!\n']

`re.DOTALL` and `re.S` [are synonyms](https://docs.python.org/2/library/re.html#re.S). — Wiktor Stribiżew, Jul 15 '15 at 19:40

brenns10 · Answer 2 · 2015-07-15T19:43:46.207

1

You're looking for the re.DOTALL flag, with a regex like this: var(.*?)secondVar. This regex would capture everything between var and secondVar.

edited Jul 15 '15 at 19:43

answered Jul 15 '15 at 19:36

brenns10

3,109
3
22
24

`.*` is not correct as it is greedy, and will overmatch if there are more delimited sections. – Wiktor Stribiżew Jul 15 '15 at 19:41

Matching newline and any character with Python regex

2 Answers2

Linked