0

How would I go about taking the following regular expression:

re.compile(r'(\'?(("(\"|[^"])*")|^#.*\n|;.*\n|\(|\)|-?[0-9]+(\.[0-9]+)?|[a-zA-Z-#][a-zA-Z0-9.+-/\!]*))|\'.*')

and writing it across multiple lines?

For whatever reason my attempts at using re.VERBOSE seem to change the value of the expression.

8-Bit Borges
  • 9,643
  • 29
  • 101
  • 198
Semi
  • 11
  • 1
  • 2

2 Answers2

1

If you put two strings next to each other, with only whitespace between them, then the python interpreter will concatenate them

re.compile(r'(\'?(("(\"|[^"])")|^#.\n|;.\n|(|)|-?[0-9]+(.[0-9]+)?|'
           r'[a-zA-Z-#][a-zA-Z0-9.+-/!]))|\'.*')
Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96
  • Ah! My hero! My Python is shaky, but I had the same thought earlier. For whatever reason it didn't work (typo perhaps). Since you suggested it I figured I would try it again, and well, it didn't work. But at that point I was pretty confident it should work, so I tried the same thing a third time and it worked. I suppose this just goes to show that tired coding is bad coding. – Semi Apr 21 '18 at 02:19
0

When converting your regex to re.VERBOSE mode make sure that:

  • whitespace is escaped or replaced with some equivalent: \s or [ ] or '\ ' (I had to put '' here due to SO formatting). All plain whitespace is ignored with re.VERBOSE.
  • hashes # are escaped or replaced the same way: \#, etc. Hashes work as commenting means with re.VERBOSE.

Then you can rewrite your regex using python's multiline strings:

import re
re.compile(r'''
    (\'?(("(\"|[^"])*")
         |^\#.*\n
         |;.*\n
         |\(
         |\)
         |-?[0-9]+(\.[0-9]+)?
         |[a-zA-Z-#][a-zA-Z0-9.+-/\!]*
        )
    )
    |\'.*
    ''', re.VERBOSE)
wolfrevokcats
  • 2,100
  • 1
  • 12
  • 12