I am trying to parse a sourcetext to extract comments from blocks (marked by ;; at the start and end). However I am facing the issue that regex is being greedy and extracts the longest possible string instead of stopping as soon as possible.
The obvious answer is to use a lazy quantifier. In my case the regex should be:
";;.*?;;"
However is you run the following code with this condition no match is returned.
#include <regex.h>
#include <stdio.h>
const size_t number = 1;
const char* regParam = ";;.*?;;";
const char* sampleText = ";;what I want;; greed doesn't pay ;;what I want next time;; unimportant rest";
int main()
{
regmatch_t matches[number];
regex_t reegex;
if(regcomp( &reegex, regParam, 0)!=REG_NOERROR)
{
printf("error while compiling regex\n");
exit(1);
}
switch(regexec( &reegex, sampleText, number, matches, 0))
{
case REG_NOERROR:
for(int i=0; i<number; ++i)
{
printf("%d, %d\n", matches[i].rm_so, matches[i].rm_eo);
for(int z=matches[i].rm_so; z<matches[i].rm_eo; z++)
{
printf("%c", sampleText[z]);
}
printf("\n");
}
break;
case REG_NOMATCH:
printf("no match found\n");
break;
default:
printf("error occurred\n");
}
return 0;
}
The question mark is recognized as a literal and since there are no such symbols in the text there is no match. The correct output should have been:
;;what I want;;
What's the correct syntax for a lazy quantifier? I have tried adding a number of backslashes before the question mark, since that helped with scopes and brackets, but to no avail. I've also written a similar piece of code in c# where ";;.*?;;" did exactly what I needed, however I would prefer c, since most of my project is written in c and .NET requires quite a bit to make it work on a fresh Linux machine.