I am trying to write an elementary XML parser in C, without using any non-standard libraries, which will be able to:
- detect several different tags
- detect an empty tag
- detect tag mismatch
The main problem I have is how to differ which is which: beginning of the tag, content and ending of the tag.
My idea was to implement a finite-state machine while reading the file in order to know what I am reading.
Please tell me your ideas and correct me if I am pointed into the wrong direction.
EDIT: added a chunk of code that detects the elements and content
char tmp, buff = -1;
char *content = (char*) malloc(sizeof(char) * (size + 1));
int stage = -1;
int i = 0;
while((tmp = fgetc(file)) != EOF) {
if(tmp == '<') {
if(stage == 2 && buff != '>'){
printf("content: ");
printCont(content,i);
}
stage = 1;
buff = tmp;
i = 0;
continue;
}else if(tmp == '/' && buff == '<') {
stage = 3;
buff = tmp;
i = 0;
continue;
} else if(tmp == '>') {
if (stage == 1) {
printf("tag_start: ");
} else if (stage == 3) {
printf("tag_end: ");
} else if (stage == 2) {
printf("content: ");
}
buff = tmp;
printCont(content,i);//reads the contnet
stage = 2;
i = 0;
continue;
}
if(tmp != ' ' && tmp != '\n' && tmp != '\t') {//simple filter
content[i] = tmp;
buff = tmp;
i++;
}
}
I would be really greatful if you could comment me on the code above and tell me how to improve it. So far it detects the tags and the content, which is what I really needed in the first place.