12

I have a long regex that I want to continue on to the next line, but everything I've tried gives me either an EOL or breaks the regex. I have already continued the line once within the parenthesis, and have read How can I do a line break (line continuation)? among other things.

Working, but still too long:

REGEX = re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)')

Wrong:

REGEX = re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\)[A-Z0-9]+
            )\s+([a-zA-Z\d-]+)')

SyntaxError: EOL while scanning string literal


REGEX = re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\
                )[A-Z0-9]+)\s+([a-zA-Z\d-]+)')
    
sre_constants.error: unbalanced parenthesis


REGEX = re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+( \
            [0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)')

regex no longer works


REGEX = (re.compile(
            r'\d\s+\d+\s+([A-Z0-9-]+)\s+(
            [0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)'))

SyntaxError: EOL while scanning string literal

I have been able to shorten my regex so that this is no longer an issue, but I'm now interested to know how I might do line continuation with a long regex?

martineau
  • 119,623
  • 25
  • 170
  • 301
bordeltabernacle
  • 1,603
  • 5
  • 24
  • 46
  • https://docs.python.org/2/library/re.html?highlight=verbose#re.VERBOSE – vks Oct 19 '15 at 09:58
  • The problem with your search query was that you're thinking of this as continuing a line, and the answer you found is about "continuing a _logical_ line," which isn't what you need. The terminology to get the answer you needed was "continuing a multiline _string_." – TigerhawkT3 Oct 19 '15 at 10:02
  • 1
    I guess I perceived the regex to be distinct from a regular Python string. Though the `re.VERBOSE` answer is specific to a regex beyond making the regex a multiline string. – bordeltabernacle Oct 19 '15 at 10:07
  • The only difference between a regex and the strings you usually work with is the `r` to denote a raw string. It's only there for convenience. You can use `r` with strings not intended as regexes (e.g. `r'C:\Users'`), and you can make regex strings without `r` (e.g. `'[0-9]{3}-[0-9]{3}-[0-9]{4}'`). – TigerhawkT3 Oct 19 '15 at 10:20

3 Answers3

23

If you use the re.VERBOSE flag, you can split your regular expression up as much as you like to make it more readable:

pattern = r"""
    \d\s+
    \d+\s+
    ([A-Z0-9-]+)\s+
    ([0-9]+.\d\(\d\)[A-Z0-9]+)\s+
    ([a-zA-Z\d-]+)"""

REGEX = re.compile(pattern, re.VERBOSE)

This approach is explained in the excellent "Dive Into Python" book.
See "Verbose Regular Expressions".

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
7

You can use multiple strings in multiple lines, and Python would concatenate them (as long as the multiple strings are between ( and )) before sending to re.compile. Example -

REGEX = re.compile(r"\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\)"
                   r"[A-Z0-9]+)\s+([a-zA-Z\d-]+)")
Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
4

try:

regex = re.compile(
    r'\d\s+\d+\s+([A-Z0-9-]+)\s+('
    r'[0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)'
)
ojii
  • 4,729
  • 2
  • 23
  • 34