3

Suppose I have the following Python string

str = """
....
Dummyline

Start of matching
+----------+----------------------------+
+   test   +           1234             +
+   test2  +           5678             +
+----------+----------------------------+

Finish above. Do not match this
+----------+----------------------------+
+  dummy1  +       00000000000          +
+  dummy2  +       12345678910          +
+----------+----------------------------+
"""

and I want to match everything that the first table has. I could use a regex that starts matching from

"Start"

and matches everything until it finds a double newline

\n\n

I found some tips on how to do this in another stackoverflow post (How to match "anything up until this sequence of characters" in a regular expression?), but it doesn't seem to be working for the double newline case.

I thought of the following code

pattern = re.compile(r"Start[^\n\n]")
matches = pattern.finditer(str)

where basically

[^x]

means match everything until character x is found. But this works only for characters, not with strings ("\n\n" in this case)

Anybody has any idea on it?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Andrew
  • 84
  • 2
  • 11

1 Answers1

11

You can match Start until the end of the lines, and then match all lines that start with a newline and are not immediately followed by a newline using a negative lookahead (?!

^Start .*(?:\r?\n(?!\r?\n).*)*

Explanation

  • ^Start .* Match Start from the start of the string ^ and 0+ times any char except a newline
  • (?: Non capture group
    • \r?\n Match a newline
    • (?!\r?\n) Negative lookahead, assert what is directly to the right is not a newline
    • .* Match 0+ times any character except a newline
  • )* Close the non capturing group and repeat 0+ times to get all the lines

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70