I wrote a C program for a school assignment that reads certain html tags from a file, given on the command line. Since I am initially searching for the '<' character I am using fgetc() to read in a character one at a time from the file to "input" with is a character pointer. My program "works" in terms of reading the the tags correctly but I have to hit enter twice to get the desired output. I know this is a common issue and I've read a few threads on it but most issues have to do with scanf() and I don't think fgetc add the new line character ? I tested reading in the file and I am able to output it correctly except for a random character at the end which I'm guessing is the null character? Here is how I read the file in:
FILE *inputFile = fopen(argv[1], "r");
input = (char*)malloc(sizeof(char)*FILE_SIZE+1);
tags = (char**)malloc(sizeof(char*)*MAX_TAGS);
tagsCount = (int*)malloc(sizeof(int)*MAX_TAGS);
if(inputFile == NULL){
printf("Empty file\n");
return EXIT_FAILURE;
}
do{
if(feof(inputFile)){
break ;
}
input[j] = fgetc(inputFile);
j++;
}
while(1);
The program is long so I won't include it all but what I do after this is call a method that searches for the desired tags, (the requirements for this were only certain ones):
int readHtmlFile(char *input, char **tags, int *tagsCount){
int length = 0;
int tagFound = 0;
char *tag = NULL;
int tagIndex = 0;
if((length=read(0, input, FILE_SIZE))){
for(int i=0; i<FILE_SIZE; i++){
if(tagFound == 0){
//we can check the closing tags and find out
//all the tags available in html file
if(input[i] == '<'){
tag=(char*)malloc(MAX_TAG_SIZE);
memset(tag, '\0', MAX_TAG_SIZE);
tagFound=1;
}
}
else{
//read the tag until '>' comes to denotes the tag is found
//some tags are spaces and there are some tags which doesnt have end tag
//all these cases are taken care
if(input[i]=='>' || input[i]==' '|| input[i]=='/' || input[i]=='!'){
tagFound=0;
tagIndex=0;
if(input[i]!='/' && input[i]!='!'){
updateTagCount(tag, tags, tagsCount);
}
else{
free(tag);
}
}
else{
//update the tag name in local heap allocated variable
//and copy to tags variable
tag[tagIndex++]=input[i];
}
}
}
}
}
Which is when I execute "./htags HelloWorld.html" this is on linux with the gcc compiler and -std=c99 flag included, there are a few other functions I didn't include, they check for duplicates, free memory and print. Sorry for the lengthy question, any help is appreciated!