8

Regexes containing meaningful spaces break when re.VERBOSE is added, apparently because re.VERBOSE 'helpfully' magics away the (meaningful) whitespace inside 'Issue Summary', as well as all the crappy non-meaningful whitespace (e.g. padding and newlines inside a (multiline) pattern). (My use of re.VERBOSE with multiline is non-negotiable - this is actually a massive simplification of a huge multiline regex where re.VERBOSE is necessary just to stay sane.)

import re
re.match(r'''Issue Summary.*''', 'Issue Summary: fails''', re.U|re.VERBOSE)
# No match!
re.match(r'''Issue Summary.*''', 'Issue Summary: passes''', re.U)
<_sre.SRE_Match object at 0x10ba36030>
re.match(r'Issue Summary.*', 'Issue Summary: passes''', re.U)
<_sre.SRE_Match object at 0x10b98ff38>

Is there a saner alternative to write re.VERBOSE-friendly patterns containing meaningful spaces, short of replacing each instance in my pattern with '\s' or '.', which is not just ugly but counter-intuitive and a pain to automate?

re.match(r'Issue\sSummary.*''', 'Issue Summary: fails', re.VERBOSE)
<_sre.SRE_Match object at 0x10ba36030>
re.match(r'Issue.Summary.*''', 'Issue Summary: fails', re.VERBOSE)
<_sre.SRE_Match object at 0x10b98ff38>

(As an aside, this a useful docbug catch on Python 2 and 3. I'll file it once I get consensus here on what the right solution is)

smci
  • 32,567
  • 20
  • 113
  • 146
  • Why are you using all the triple quotes? They aren't helping, and they're obscuring your strings. `r'''abc'''` is just `r'' + 'abc' + ''`, or `'abc'`. The `r` isn't even taking effect since it ends after the initial empty string. – Tom Karzes Nov 17 '17 at 00:33
  • @TomKarzes: as I stated clearly in the question **"This is actually a massive simplification of a huge multiline regex"**. The real regex is actually 14 lines long and growing. It has multiple nested sub-expressions. So like I said, multiline pattern and re.VERBOSE are non-negotiable. – smci Nov 17 '17 at 00:36
  • 1
    Ok - it's just that by making them single-line, the meaning of the quotes has changed. – Tom Karzes Nov 17 '17 at 00:37
  • @TomKarzes: ahh, you're right. Raw multiline string: `r'''this is wrong'''` . The right syntax must use r with double-quotes: `r"""this is right"""`. See [How to correctly write a raw multiline string in Python?](https://stackoverflow.com/questions/46003452/how-to-correctly-write-a-raw-multiline-string-in-python). My misconception is due to other people having been spreading the same mistake for years. Related: [Python regex compile (with re.VERBOSE) not working](https://stackoverflow.com/questions/13761723/python-regex-compile-with-re-verbose-not-working) – smci Nov 17 '17 at 01:00
  • @smci No. The single and double quotes are completely interchangeable. And so are the single and double triple quotes, prefixed or not. See the [language reference](https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals) or [this question](https://stackoverflow.com/questions/7783097/when-to-use-triple-single-quotes-instead-of-triple-double-quotes). – Jeyekomon Jul 09 '21 at 13:26

1 Answers1

10

If re.VERBOSE is used, then I think there's no choice other than to change the regular expression string. However, I would suggest one of the following:

r'abc\ def'

or:

r'abc[ ]def'

Both r'\ ' and '[ ]' match a single space character (not any whitespace, only an actual space). Note that, without the r in front, the backslash character would need to be doubled, i.e. \\.

Tom Karzes
  • 22,815
  • 2
  • 22
  • 41