0

I'm trying to write the QRegExp for extracting variable names from qmake project code (*.pro files).
The syntax of variable usage have two forms:

  • $$VAR
  • $${VAR}

So, my regular expression must handle both cases. I'm trying to write expression in this way:

\$\$\{?(\w+)\}?

But it does not work as expected: for string $$VAR i've got $$V match, with disabled "greeding" matching mode (QRegExp::setMinimal (true)). As i understood, gready-mode can lead to wrong results in my case.
So, what am i doing wrong?
Or maybe i just should use greedy-mode and don't care about this behavior :)

P.S. Variable name can't contains spaces and other "special" symbols, only letters.

eraxillan
  • 1,552
  • 1
  • 19
  • 40

1 Answers1

2

You do not need to disable greedy matching. If greedy matching is disabled, the minimal match that satisfies your expression is returned. In your example, there's no need to match the AR, because $$V satisfies your expression.

So turn the minimal mode back on, and use

\$\$(\w+|\{\w+\})

This matches two dollar signs, followed by either a bunch of word characters, or by a bunch of word characters between braces. If you can trust your data not to contain any non-matching braces, your expression should work just as well.
\w is equal to [A-Za-z0-9_], so it matches all digits, all upper and lowercase alphabetical letters, and the underscore. If you want to restrict this to just the letters of the alphabet, use [A-Za-z] instead.

Since the variable names can not contain any special characters, there's no danger of matching too much, unless a variable can be followed directly by more regular characters, in which case it's undecidable.
For instance, if the data contains a string like Buy our new $$Varbuster!, where $$Var is supposed to be the variable, there is no regular expression that will separate the variable from the rest of the string.

SQB
  • 3,926
  • 2
  • 28
  • 49
  • Thanks, that's working fine. I need only to embrace `\w+` expression to be able to capture it. About letters-only: no, variable can contain digits in its name, e.g. `VAR11`, but can't begin with it. – eraxillan Feb 20 '14 at 09:38
  • Okay, if it is an issue, you can replace `\w+` with `[A-Za-z]\w*`. – SQB Feb 20 '14 at 09:40
  • And about `$$Varbuster` - i think the impossibility of `Var` extraction is obvious :) how to distinct one letter from another except black magic usage... – eraxillan Feb 20 '14 at 09:41
  • `[A-Za-z]\w*` means "a letter and various number of letters/digits"? – eraxillan Feb 20 '14 at 09:42
  • @Axilles I think that's what the braces would be for: `Buy our new $${Var}buster!` – SQB Feb 20 '14 at 09:43
  • @Axilles Yes, that what it means. We change the `+` to a `*`, since a variable name consisting of just a single letter is allowed. Basically it says "a letter, followed by 0 or more word characters". – SQB Feb 20 '14 at 09:45
  • Ok, now all is clear and working correctly. Thanks again! However, i see parsing with regex is a terrible idea :) – eraxillan Feb 20 '14 at 09:47
  • Not always. In this case, it should be fine. But don't get me started on [html](http://stackoverflow.com/a/1732454/2936460)... – SQB Feb 20 '14 at 09:51
  • Oh, this is my favorite answer ever :D I'm already parse some XHTML code with regexes so i'm feel the benefits of the Dark Side. – eraxillan Feb 20 '14 at 09:54