1

I would like to know/have a qregexp which could extract all integers from a line but stop extracting if the digit resides in a comment section

For Example

    { 20,100,0X0},/*this line contains 2 integers*/

My code

QRegExp("(\\d+)\\}"); 

does the job but is not efficient since the comments can come inside the flower braces

For Example, my Expression WILL NOT WORK IF
{ 20,100/*new comment 2*/,0X0}

So how do I ignore the string inside the comment section using QRegExp and continue to search my expression

Alan Moore
  • 73,866
  • 12
  • 100
  • 156

2 Answers2

0

I suggest matching all the multiline comments as the first alternative in a regex, and match and capture the digit sequences (i.e. use the capturing group around [0-9]+ pattern):

QRegExp("/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|\\b([0-9]+)\\b")

Now, the digits you need will be in cap(1).

See the regex demo

It also looks like you need to use word boundaries around the [0-9]+ pattern to match standalone, "whole-word" digit chunks.

Pattern details:

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

You will need to find the comment sections separately to do this reliably, unless the regex engine supports full regex in negative lookbehind (which - according to http://www.regular-expressions.info/ - only the .NET and JGsoft engines do).

The first pass removes or skips the comment sections in your string, then you do the number matching as you like (e.g. like now).

To find comments, you can use this pattern:

/\*((?!\*/).)*\*/

If you need to deal with nested comment sections, if required, you need to do remove the comments and repeat until no more comment sections are found.

On the other hand, if nested comments are not a requirement, you can combine the comment and digit matching regexes into one and then check the matched string (or captures) to find out if it was a comment or a digit match.

Lucero
  • 59,176
  • 9
  • 122
  • 152
  • The `\/\*.*?\*\/` is very inefficient due to the lazy dot matching pattern. It was unrolled in [*Mastering Regular Expressions* book, *Unrolling-The-Loop Components for C Comments* section](http://ww2.ii.uj.edu.pl/~tabor/prII09-10/perl/master.pdf), see the pattern in my answer. Also, QRegExp does not support lazy quantifiers, and there is no need to escape the `/` slash as it is not a special regex metacharacter. – Wiktor Stribiżew Aug 04 '16 at 13:52
  • @WiktorStribiżew Well, how efficient it is depends on the engine implementation only. This could be implemented using a DFA which has linear runtime. That being said, I have no idea of the performance characteristics of QRegExp, but I have just found out that it does not seem to support individual lazy quantifiers at all, so I'll update my answer to reflect this. – Lucero Aug 04 '16 at 13:56
  • Yes, there are several ways to unroll that pattern. – Wiktor Stribiżew Aug 04 '16 at 13:57
  • Technically it wont, but you can make it to support, check the minimal method of QRegExp it can be set to true to help lazy quantifiers – Sivaramakrishna Shriraam Aug 04 '16 at 14:01