0

So, I'm using a lexer to and after all the flex shenanigans I get this text:

ISEP 1252 "ADDRESS"
"Name1" 1253 "INFORMATICA"
"Name2" 1254 "Boilerplate1"
"Name3" 1255 "Boilerplate2"
"Name4" 1256 "Boilerplate3"
"Name5" 1257 "Boilerplate4"
"Name6" 1258 "Boilerplate5"

and its stored in yytext, then im proceding to separate each line and the contents of the line using strtok:

// get first line
char* line = strtok(yytext, "\n");
// get school name
char* schoolName = strtok(line, " \t");
// get school num students
int schoolNumStudents = atoi(strtok(NULL, " \t"));
// get school address inside the quotes
char* schoolAddress = strtok(NULL, "\"");

strcpy(schools[schoolCount].name, schoolName);
schools[schoolCount].numStudents = schoolNumStudents;
strcpy(schools[schoolCount].address, schoolAddress);
//print school[schoolCount]
printf("Escola: %s\n", schools[schoolCount].name);
printf("Num alunos: %d\n", schools[schoolCount].numStudents);
printf("Morada: %s\n", schools[schoolCount].address);

// get teachers
line = strtok(NULL, "\n");
while (line != NULL) {
    char* teacherName = strtok(line, "\"");
    int teacherExt = atoi(strtok(NULL, " \t"));
    char* teacherDepartment = strtok(NULL, "\"");

    schools[schoolCount].numTeachers++;
    if(schools[schoolCount].teachers == NULL) {
        schools[schoolCount].teachers = (struct Teacher*) malloc(sizeof(struct Teacher));
    }
    else {
        schools[schoolCount].teachers = (struct Teacher*) realloc(schools[schoolCount].teachers, sizeof(struct Teacher) * (schools[schoolCount].numTeachers));
    }

    printf("Nome: %s\n", teacherName);
    printf("Ext: %d\n", teacherExt);
    printf("Departamento: %s\n", teacherDepartment);
    line = strtok(NULL, "\n");
}

schoolCount++;

the thing to see here is that, being yytext the string I provided, the second strtok(NULL, "\n") returns NULL instead of the second line. Is there something I'm missing?

PS: there is nothing beside the code I provided related to C, the code block is nested inside a lex rule.

I tried copying the contents of yytext to a different variable since strtok alters the variable and yytext is reserved to lex, that didn't accomplish anything, I tried clearing the strtok buffer and retrying the strtok in the second line onwards, also didn't work.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • 1
    You can't alternate `strtok()` between different strings, because it keeps static state related to the current string, and it modifies the string in place. Use `strtok_r()` so you can keep separate state for each string you're parsing. – Barmar Apr 21 '23 at 19:33
  • Why are you using a lexer **and** `strtok`? The lexer will have much more detailed and fast string extraction techniques available. – Neil Apr 21 '23 at 20:16

1 Answers1

1

The problem is that after calling strtok for the substring line

char* line = strtok(yytext, "\n");
// get school name
char* schoolName = strtok(line, " \t");
//..

the function strtok keeps a pointer inside the array line. So the next call of strtok

line = strtok(NULL, "\n");

refers to line instead of yytext.

One approach to avoid the problem is to calculate the length of the string stored in line like for example

char *pos = yytext;

char* line = strtok( pos, "\n");
size_t n = strlen( line );
//...

and to use the value as an offset within the array yytext for the next call of strtok for the array yytext

pos += n;
line = strtok( pos, "\n");
//...
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335