0

I am trying to match a multiline/block comment that looks like this

<# This is a multiline comment
This is a multiline comment
This is a multiline comment
This is a multiline comment
This is a multiline comment
#>

The "<# #>" determining the beginning and end of a block comment. I am using PySide in my application and the QRegExp function like so:

multiline_comment =    r'<#(.*)?#>'
comment_ml_syntax  = QtCore.QRegExp(Syntax.comment_ml_match)
comment_ml_format  = QtGui.QTextCharFormat()
comment_ml_format.setForeground(Colors.COMMENT_COLOR)

QRegExp doesn't seem to match the multi-line comment. Is there some kind of option or flag that I am missing?

user2444217
  • 581
  • 2
  • 7
  • 16

2 Answers2

3

Just pass in the re.DOTALL flag and capture everything between the start and end tags. re.DOTALL ensures that . matches newlines as well. And use the non-greedy quantifier ? to ensure your match does not span more than one comment

re.search(r'<#(.*?)#>', comment, re.DOTALL).group(1)
iruvar
  • 22,736
  • 7
  • 53
  • 82
0

First off, if '(.*)?' is supposed to be the non-greedy any matches qualified it isn'. It is the greedy match any (match as many as possible) then followed by the optional qualifier (which has no effect). This will match from the first "<#" and only match whitespace after the first newline. And then match the last "#>" within those specifications. Here's my solution:

r"<#((?!#>)(.|\s))*#>"

It catches the opener, then matches any text inside the pair so long as the closer does not appear to be the next set of characters, and finally matches the closer.

DoubleMx2
  • 375
  • 1
  • 9
  • The correct way to match everything-including-newlines is by passing the `re.DOTALL` flag to the constructor or by prefixing the regex with `(?s)`. Never, **ever** use this: `(.|\s)`; see [this answer](http://stackoverflow.com/a/2408599/20938) for the reasons why. Your regex works okay on the OP's sample string, which matches. But when I force a non-match by removing the trailing `>` it locks up. And that's almost guaranteed to happen whenever you use `(.|\s)` in a regex. – Alan Moore Aug 30 '13 at 04:08