1

I have a text like

var12.1
a
a
dsa

88
123!!!
secondVar12.1

The string between var and secondVar may be different (and there may be different count of them).

How can I dump it with regexp?
I'm trying something something like this to no avail:

re.findall(r"^var[0-9]+\.[0-9]+[\n.]+^secondVar[0-9]+\.[0-9]+", str, re.MULTILINE)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Alexey Berezuev
  • 785
  • 9
  • 27

2 Answers2

4

You can grab it with:

var\d+(?:(?!var\d).)*?secondVar

See demo. re.S (or re.DOTALL) modifier must be used with this regex so that . could match a newline. The text between the delimiters will be in Group 1.

NOTE: The closest match will be matched due to (?:(?!var\d).)*? tempered greedy token (i.e. if you have another var + a digit after var + 1+ digits then the match will be between the second var and secondVar.

NOTE2: You might want to use \b word boundaries to match the words beginning with them: \bvar(?:(?!var\d).)*?\bsecondVar.

REGEX EXPLANATION

  • var - match the starting delimiter
  • \d+ - 1+ digits
  • (?:(?!var\d).)*? - a tempered greedy token that matches any char, 0 or more (but as few as possible) repetitions, that does not start a char sequence var and a digit
  • secondVar - match secondVar literally.

IDEONE DEMO

import re
p = re.compile(r'var\d+(?:(?!var\d).)*?secondVar', re.DOTALL)
test_str = "var12.1\na\na\ndsa\n\n88\n123!!!\nsecondVar12.1\nvar12.1\na\na\ndsa\n\n88\n123!!!\nsecondVar12.1"
print(p.findall(test_str))

Result for the input string (I doubled it for demo purposes):

['12.1\na\na\ndsa\n\n88\n123!!!\n', '12.1\na\na\ndsa\n\n88\n123!!!\n']
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

You're looking for the re.DOTALL flag, with a regex like this: var(.*?)secondVar. This regex would capture everything between var and secondVar.

brenns10
  • 3,109
  • 3
  • 22
  • 24