I am a C beginner and working on this problem for weeks now, me and my colleagues can't figure out the solution.
Part 1 of the problem: I use the standard C regex lib (regex.h) included under most Linux distributions. As in many examples online, I use a match function which looks like this
int match(const char *string, char *pattern) {
int status;
regex_t re;
if (regcomp(&re, pattern, REG_EXTENDED) != 0) {
char buffer[100];
regerror(status, &re, buffer, 100);
printf("regcomp() failed with '%s'\n", buffer);
return(0); /* Report error. */
}
status = regexec(&re, string, (size_t) 0, NULL, 0);
regfree(&re);
if (status != 0) {
char buffer[100];
regerror(status, &re, buffer, 100);
printf("regcomp() failed with '%s'\n", buffer);
return(0); /* Report error. */
}
printf("match: %s<\n",string);
return(1);
}
Then, I have a main function with some regex to be checked against an input (which I simulated in this case by values[1]). In this case, it should only match against the second entry of values, the rest should return 0.
int main() {
int i = 0;
char* values[16] = {"ADCICT.A100311.ANTRAG","ADCICT.A100311.ANTRAG.NR","ADDB2P.K004111.PLANxUEB","ADDB2Q.K004111.PLANxUEB","ADDB2P.K004111.PRODxUEB**",
"ADDB2Q.K004111.PRODxUEB**","ADDB2P.K004111.SQLCODE","ADDB2Q.K004111.SQLCODE","ADDB2P.K004111.VORP#UEB","ADDB2Q.K004111.VORP#UEB",
"ADEDVT.A347709.DDIO.*.PGM%COB**","AD000T.K001800.CICS.**","A9VIST.K001804.INFOS","ABC4","ABC5"};
for ( i = 0; values[i] != NULL; i++ ) {
char *theRegex = (char *) malloc(100);
memset(theRegex, 0x00, 100);
theRegex = values[i];
printf("regexV=%x<", theRegex);
transformRegex(&theRegex);
printf("regexN=%s< ", theRegex);
int reti = match(values[1], theRegex);
printf("reti=%i\n", reti);
fflush(stdout);
//free(theRegex);
}
}
transformRegex takes a char* and just adds ^ in the beginning and $ in the end:
int transformRegex(char **regexS){
char tmpStr[strlen(*regexS)+3];
memset(tmpStr, 0x00, strlen(*regexS)+3);
memcpy(tmpStr, "^", 1);
memcpy(&tmpStr[1], *regexS, strlen(*regexS));
strcat(tmpStr, "$");
*regexS = tmpStr;
return 0;
}
In fact, the transformRegex function was supposed to do a lot more, but since I couldn't figure out the solution for this problem, I had to remove as much code as possible and now I am really, really exhausted because I cannot solve it.
If I run this program (using gdb), what I get is this:
regexV=4010dc<regexN=^ADCICT.A100311.ANTRAG$< match: ADCICT.A100311.ANTRAG.NR<
reti=1
regexV=4010f2<regexN=^ADCICT.A100311.ANTRAG.NR$< match: ADCICT.A100311.ANTRAG.NR<
reti=1
regexV=40110b<regexN=^ADDB2P.K004111.PLANxUEB$< regcomp() failed with 'No match'
[...]
reti=0
regexV=401207<regexN=^A9VIST.K001804.INFOS$< regcomp() failed with 'No match'
reti=0
regexV=40121c<regexN=^ABC4$< match: ADCICT.A100311.ANTRAG.NR<
reti=1
regexV=401221<regexN=^ABC5$< match: ADCICT.A100311.ANTRAG.NR<
reti=1
How can the last two things possibly match? Not to mention the first one ...
Problem 2: I noticed this problem often before, but it seemed to disappear by itself. If I just take out this line
printf("regexV=%x<", theRegex);
My first lines of output are
regexN= Üÿÿ< regcomp() failed with 'No match'
reti=0
What in the name of god is this? How can a printf statement affect my code like this?
Problem 3: I usually want to free the memory I allocated. Because I allocated theRegex, I want to free it at the end of the loop with
free(theRegex)
But see what happens if I do so:
regexV=4010ec<regexN=^ADCICT.A100311.ANTRAG$< match: ADCICT.A100311.ANTRAG.NR<
reti=1
*** glibc detected *** /home/itgsandbox/KK/a.out: double free or corruption (out): 0x00007fffffffdb70 ***
[...]
Program received signal SIGABRT, Aborted.
0x00007ffff7ab2945 in raise () from /lib64/libc.so.6
I am really at wit's end (which doesn't mean much because I just started with C), but these problems seems to tackle with someone really subtle. Please help me, I trust in you, Stackoverflow!