2

Windows 7 SP1
MSVS 2010
Qt 4.8.4

I'm experimenting with the Qt Syntax Highlighter Example.

I have an application that needs to highlight words that start with a hyphen. So I modify the regular expression from this code fragment:

classFormat.setFontWeight(QFont::Bold);
classFormat.setForeground(Qt::darkMagenta);
rule.pattern = QRegExp("\\bQ[A-Za-z]+\\b");
rule.format = classFormat;
highlightingRules.append(rule);

which highlights words that start with Q. I change it to:

rule.pattern = QRegExp("\\b-[A-Za-z]+\\b");

and nothing happens.

I try

rule.pattern = QRegExp("\\b\\-[A-Za-z]+\\b");

Nothing.

Out of curiosity, I try

rule.pattern = QRegExp("\\b[-A-Za-z]+\\b");

If I start typing a hyphen, the hyphen is unhighlighted and every other alpha is highlighted. According to How to match hyphens with Regular Expression? this should be kosher.

Question: How do I write the regular expression to highlight words starting with a hyphen?

Community
  • 1
  • 1
Macbeth's Enigma
  • 375
  • 3
  • 15

1 Answers1

3

The problem is that a hyphen - is considered not being part of a word. This means, that the word boundary assertion \b will match between the hyphen and the actual word. In other words: There is no word starting with a hyphen.

To solve this issue, place the hyphen before \b, meaning you want to match "a hyphen, followed by a word consisting of letters". You can even remove the first \b, because [a-zA-Z]+ is a word anyway:

rule.pattern = QRegExp("-[A-Za-z]+\\b");
leemes
  • 44,967
  • 21
  • 135
  • 183
  • Magic, thank you. Interesting. I thought word boundaries were defined by whitespace and any other character would be considered part of the word. When I used an underscore, it worked just fine. What is special about the hyphen? – Macbeth's Enigma Jan 16 '13 at 01:23
  • 1
    Almost all special characters are considered word boundaries. Double-click on a word which is followed by a period, or quotation mark, or plus sign, ... It's the same. These characters aren't considered part of the word. However, we see underscores as part of names (e.g. variables in a source code for example), so it makes more sense to consider them as word characters. It's an arbitrary definition... – leemes Jan 16 '13 at 01:26