I'm trying to find URLs within a large block of text
regex_t reg;
const char *regex="REGEXGOESHERE";
regmatch_t matches[16];
//Read data into variable filecontent
regcomp(®, regex, REG_EXTENDED);
int offset=0;
int j;
int found=0;
int start,end;
while( regexec(®, filecontent+offset, 16, matches, 0) == 0)
{
printf("\n\n");
start = matches[0].rm_so+offset;
end = matches[0].rm_eo-1+offset;
printf("regex /%s/ at bytes %d-%d\n",
regex, start, end);
for (j=start; j<=end; j++)
{
printf("%c",filecontent[j]);
}
offset += matches[0].rm_eo;
found = 1;
}
close(f);
Now this works for a simple regex in const char * regex
, like say regex = "https?.*.png"
. But if I want a complex regex for a URL like (https?:\/\/.*\.(?:png|jpg))
, I have to escape the backslashes and hence it becomes:
"(https?:\\/\\/.*\\.(?:png|jpg))";
And then running it gives a segmentation fault.
What might be going wrong?