How do I do line continuation with a long regex?

Question

I have a long regex that I want to continue on to the next line, but everything I've tried gives me either an EOL or breaks the regex. I have already continued the line once within the parenthesis, and have read How can I do a line break (line continuation)? among other things.

Working, but still too long:

REGEX = re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)')

Wrong:

REGEX = re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\)[A-Z0-9]+
            )\s+([a-zA-Z\d-]+)')

SyntaxError: EOL while scanning string literal


REGEX = re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\
                )[A-Z0-9]+)\s+([a-zA-Z\d-]+)')
    
sre_constants.error: unbalanced parenthesis


REGEX = re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+( \
            [0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)')

regex no longer works


REGEX = (re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+(
            [0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)'))

SyntaxError: EOL while scanning string literal

I have been able to shorten my regex so that this is no longer an issue, but I'm now interested to know how I might do line continuation with a long regex?

https://docs.python.org/2/library/re.html?highlight=verbose#re.VERBOSE — vks, Oct 19 '15 at 09:58
The problem with your search query was that you're thinking of this as continuing a line, and the answer you found is about "continuing a _logical_ line," which isn't what you need. The terminology to get the answer you needed was "continuing a multiline _string_." — TigerhawkT3, Oct 19 '15 at 10:02
I guess I perceived the regex to be distinct from a regular Python string. Though the `re.VERBOSE` answer is specific to a regex beyond making the regex a multiline string. — bordeltabernacle, Oct 19 '15 at 10:07
The only difference between a regex and the strings you usually work with is the `r` to denote a raw string. It's only there for convenience. You can use `r` with strings not intended as regexes (e.g. `r'C:\Users'`), and you can make regex strings without `r` (e.g. `'[0-9]{3}-[0-9]{3}-[0-9]{4}'`). — TigerhawkT3, Oct 19 '15 at 10:20

Martin Evans · Accepted Answer · 2018-10-28T14:45:41.653

23

If you use the re.VERBOSE flag, you can split your regular expression up as much as you like to make it more readable:

pattern = r"""
    \d\s+
    \d+\s+
    ([A-Z0-9-]+)\s+
    ([0-9]+.\d\(\d\)[A-Z0-9]+)\s+
    ([a-zA-Z\d-]+)"""

REGEX = re.compile(pattern, re.VERBOSE)

This approach is explained in the excellent "Dive Into Python" book.
See "Verbose Regular Expressions".

edited Oct 28 '18 at 14:45

answered Oct 19 '15 at 10:00

Martin Evans

45,791
17
81
97

1

I like that this keeps it within the regular expression rather than using concatenation. – bordeltabernacle Oct 19 '15 at 10:04

Anand S Kumar · Answer 2 · 2015-10-19T09:59:51.120

7

You can use multiple strings in multiple lines, and Python would concatenate them (as long as the multiple strings are between ( and )) before sending to re.compile. Example -

REGEX = re.compile(r"\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\)"
                   r"[A-Z0-9]+)\s+([a-zA-Z\d-]+)")

edited Oct 19 '15 at 09:59

answered Oct 19 '15 at 09:51

Anand S Kumar

88,551
18
188
176

score 4 · Answer 3 · answered Oct 19 '15 at 09:51

4

try:

regex = re.compile(
    r'\d\s+\d+\s+([A-Z0-9-]+)\s+('
    r'[0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)'
)

answered Oct 19 '15 at 09:51

ojii

4,729
2
23
34

How do I do line continuation with a long regex?

3 Answers3

Linked