0

I am trying to parse a sourcetext to extract comments from blocks (marked by ;; at the start and end). However I am facing the issue that regex is being greedy and extracts the longest possible string instead of stopping as soon as possible.

The obvious answer is to use a lazy quantifier. In my case the regex should be:

";;.*?;;"

However is you run the following code with this condition no match is returned.

#include <regex.h>
#include <stdio.h>

const size_t number = 1;
const char* regParam = ";;.*?;;";
const char* sampleText = ";;what I want;; greed doesn't pay ;;what I want next time;; unimportant rest";

int main()
{
    regmatch_t matches[number];
    regex_t reegex;

    if(regcomp( &reegex, regParam, 0)!=REG_NOERROR)
    {
        printf("error while compiling regex\n");
        exit(1);
    }
    switch(regexec( &reegex, sampleText, number, matches, 0))
    {
    case REG_NOERROR:
        for(int i=0; i<number; ++i)
        {
            printf("%d, %d\n", matches[i].rm_so, matches[i].rm_eo);
            for(int z=matches[i].rm_so; z<matches[i].rm_eo; z++)
            {
                printf("%c", sampleText[z]);
            }
            printf("\n");
        }
        break;
    case REG_NOMATCH:
        printf("no match found\n");
        break;
    default:
        printf("error occurred\n");

    }
    return 0;
}

The question mark is recognized as a literal and since there are no such symbols in the text there is no match. The correct output should have been:

;;what I want;;

What's the correct syntax for a lazy quantifier? I have tried adding a number of backslashes before the question mark, since that helped with scopes and brackets, but to no avail. I've also written a similar piece of code in c# where ";;.*?;;" did exactly what I needed, however I would prefer c, since most of my project is written in c and .NET requires quite a bit to make it work on a fresh Linux machine.

Voidsay
  • 1,462
  • 2
  • 3
  • 15

0 Answers0