1

I am a C beginner and working on this problem for weeks now, me and my colleagues can't figure out the solution.

Part 1 of the problem: I use the standard C regex lib (regex.h) included under most Linux distributions. As in many examples online, I use a match function which looks like this

int match(const char *string, char *pattern) {
int    status;
regex_t    re;

if (regcomp(&re, pattern, REG_EXTENDED) != 0) {
    char    buffer[100];
    regerror(status, &re, buffer, 100);
    printf("regcomp() failed with '%s'\n", buffer);
    return(0);      /* Report error. */
}
status = regexec(&re, string, (size_t) 0, NULL, 0);
regfree(&re);
if (status != 0) {
    char    buffer[100];
    regerror(status, &re, buffer, 100);
    printf("regcomp() failed with '%s'\n", buffer);
    return(0);      /* Report error. */
}
printf("match: %s<\n",string);
return(1); 
}

Then, I have a main function with some regex to be checked against an input (which I simulated in this case by values[1]). In this case, it should only match against the second entry of values, the rest should return 0.

int main() {
 int i = 0;
 char* values[16] = {"ADCICT.A100311.ANTRAG","ADCICT.A100311.ANTRAG.NR","ADDB2P.K004111.PLANxUEB","ADDB2Q.K004111.PLANxUEB","ADDB2P.K004111.PRODxUEB**",
 "ADDB2Q.K004111.PRODxUEB**","ADDB2P.K004111.SQLCODE","ADDB2Q.K004111.SQLCODE","ADDB2P.K004111.VORP#UEB","ADDB2Q.K004111.VORP#UEB",
 "ADEDVT.A347709.DDIO.*.PGM%COB**","AD000T.K001800.CICS.**","A9VIST.K001804.INFOS","ABC4","ABC5"}; 

 for ( i = 0; values[i] != NULL; i++ ) {
    char *theRegex = (char *) malloc(100);
    memset(theRegex, 0x00, 100);
    theRegex = values[i];
    printf("regexV=%x<", theRegex);
    transformRegex(&theRegex);
    printf("regexN=%s< ", theRegex);
    int reti = match(values[1], theRegex);
    printf("reti=%i\n", reti);
    fflush(stdout);
    //free(theRegex);
 }
}

transformRegex takes a char* and just adds ^ in the beginning and $ in the end:

int transformRegex(char **regexS){
    char tmpStr[strlen(*regexS)+3];
    memset(tmpStr, 0x00, strlen(*regexS)+3);
    memcpy(tmpStr, "^", 1);
    memcpy(&tmpStr[1], *regexS, strlen(*regexS));
    strcat(tmpStr,  "$");
    *regexS = tmpStr;
    return 0;
}

In fact, the transformRegex function was supposed to do a lot more, but since I couldn't figure out the solution for this problem, I had to remove as much code as possible and now I am really, really exhausted because I cannot solve it.

If I run this program (using gdb), what I get is this:

regexV=4010dc<regexN=^ADCICT.A100311.ANTRAG$< match: ADCICT.A100311.ANTRAG.NR<
reti=1
regexV=4010f2<regexN=^ADCICT.A100311.ANTRAG.NR$< match: ADCICT.A100311.ANTRAG.NR<
reti=1
regexV=40110b<regexN=^ADDB2P.K004111.PLANxUEB$< regcomp() failed with 'No match'
  [...]
reti=0
regexV=401207<regexN=^A9VIST.K001804.INFOS$< regcomp() failed with 'No match'
reti=0
regexV=40121c<regexN=^ABC4$< match: ADCICT.A100311.ANTRAG.NR<
reti=1
regexV=401221<regexN=^ABC5$< match: ADCICT.A100311.ANTRAG.NR<
reti=1

How can the last two things possibly match? Not to mention the first one ...

Problem 2: I noticed this problem often before, but it seemed to disappear by itself. If I just take out this line

printf("regexV=%x<", theRegex);

My first lines of output are

regexN= Üÿÿ< regcomp() failed with 'No match'
reti=0

What in the name of god is this? How can a printf statement affect my code like this?

Problem 3: I usually want to free the memory I allocated. Because I allocated theRegex, I want to free it at the end of the loop with

free(theRegex)

But see what happens if I do so:

regexV=4010ec<regexN=^ADCICT.A100311.ANTRAG$< match: ADCICT.A100311.ANTRAG.NR<
reti=1
*** glibc detected *** /home/itgsandbox/KK/a.out: double free or corruption (out): 0x00007fffffffdb70 ***
 [...]
Program received signal SIGABRT, Aborted.
0x00007ffff7ab2945 in raise () from /lib64/libc.so.6

I am really at wit's end (which doesn't mean much because I just started with C), but these problems seems to tackle with someone really subtle. Please help me, I trust in you, Stackoverflow!

dasLort
  • 1,264
  • 1
  • 13
  • 28

1 Answers1

0

Some issues with the code posted (the problems mentioend by the OP might more or less due to those, so fix the issues and see how the code performs):

Here

int transformRegex(char **regexS){
  char tmpStr[strlen(*regexS)+3];
  ....
  *regexS = tmpStr;
  return 0; 
}

the code returns a reference to memory allocated only for as long being inside transformRegex. It becomes invalid on the function's return.

Accessing the memory later on leads to Undefined Behaviour.


These lines

  char *theRegex = (char *) malloc(100);
  memset(theRegex, 0x00, 100);

are meaningless and leaking memory, as in the next line

  theRegex = values[i];

the value returned by malloc() is overwritten, and therefore lost.


The call to free() fails, as the value passed into is referring invalid memory (see above).


Update:

This array

char* values[16] = {
  "ADCICT.A100311.ANTRAG","ADCICT.A100311.ANTRAG.NR","ADDB2P.K004111.PLANxUEB",
  "ADDB2Q.K004111.PLANxUEB","ADDB2P.K004111.PRODxUEB**","ADDB2Q.K004111.PRODxUEB**",
  "ADDB2P.K004111.SQLCODE","ADDB2Q.K004111.SQLCODE","ADDB2P.K004111.VORP#UEB",
  "ADDB2Q.K004111.VORP#UEB","ADEDVT.A347709.DDIO.*.PGM%COB**","AD000T.K001800.CICS.**",
  "A9VIST.K001804.INFOS","ABC4","ABC5"
}; 

does not have any element with a value of NULL.

So the condition in this line

for ( i = 0; values[i] != NULL; i++ ) {

will never trigger. If it does the array subscribt is way behind what is defined. Accessing array elements out of bound also provokes Undefined Behaviour.

To fix this you might like to remove the array size declaration and add a "Stopper"-element to the array like so:

char* values[] = {
  "ADCICT.A100311.ANTRAG","ADCICT.A100311.ANTRAG.NR","ADDB2P.K004111.PLANxUEB",
  "ADDB2Q.K004111.PLANxUEB","ADDB2P.K004111.PRODxUEB**","ADDB2Q.K004111.PRODxUEB**",
  "ADDB2P.K004111.SQLCODE","ADDB2Q.K004111.SQLCODE","ADDB2P.K004111.VORP#UEB",
  "ADDB2Q.K004111.VORP#UEB","ADEDVT.A347709.DDIO.*.PGM%COB**","AD000T.K001800.CICS.**",
  "A9VIST.K001804.INFOS","ABC4","ABC5",
  NULL
}; 
alk
  • 69,737
  • 10
  • 105
  • 255
  • @dasLort: To debug memory management related problems I strongly recommend to use Valgrind: htttp://valgrind.org – alk Aug 02 '13 at 09:37
  • I did, but couldn't find the solution. Maybe I am not good enough with it. Now that I fixed the thing alk said, it even gives out correct matches, me happy =) But what about Problem 2? How can this be possible? – dasLort Aug 02 '13 at 10:03
  • @dasLort: Problem 2 still exists exactly as described in you posting, even though you fixed all issues as mentioend by my answer? – alk Aug 02 '13 at 10:14
  • @dasLort: `0x00` equals `0`, the former is just the hexadecimal notation of the latter. For more please see this answer: http://stackoverflow.com/q/1296843/694576 – alk Aug 02 '13 at 10:55