0

I wrote a C program for a school assignment that reads certain html tags from a file, given on the command line. Since I am initially searching for the '<' character I am using fgetc() to read in a character one at a time from the file to "input" with is a character pointer. My program "works" in terms of reading the the tags correctly but I have to hit enter twice to get the desired output. I know this is a common issue and I've read a few threads on it but most issues have to do with scanf() and I don't think fgetc add the new line character ? I tested reading in the file and I am able to output it correctly except for a random character at the end which I'm guessing is the null character? Here is how I read the file in:

FILE *inputFile = fopen(argv[1], "r");

  input = (char*)malloc(sizeof(char)*FILE_SIZE+1);
  tags = (char**)malloc(sizeof(char*)*MAX_TAGS);
  tagsCount = (int*)malloc(sizeof(int)*MAX_TAGS);

  if(inputFile == NULL){
    printf("Empty file\n");
    return EXIT_FAILURE;
  }
  do{
    if(feof(inputFile)){
      break ;
    }
    input[j] = fgetc(inputFile);
    j++;
  }
  while(1);

The program is long so I won't include it all but what I do after this is call a method that searches for the desired tags, (the requirements for this were only certain ones):

int readHtmlFile(char *input, char **tags, int *tagsCount){
  int length = 0;
  int tagFound = 0;
  char *tag = NULL;
  int tagIndex = 0;

  if((length=read(0, input, FILE_SIZE))){
    for(int i=0; i<FILE_SIZE; i++){
      if(tagFound == 0){
        //we can check the closing tags and find out
        //all the tags available in html file
        if(input[i] == '<'){
          tag=(char*)malloc(MAX_TAG_SIZE);
          memset(tag, '\0', MAX_TAG_SIZE);
          tagFound=1;
        }
      }
      else{
        //read the tag until '>' comes to denotes the tag is found
        //some tags are spaces and there are some tags which doesnt have end tag
        //all these cases are taken care
        if(input[i]=='>' || input[i]==' '|| input[i]=='/' || input[i]=='!'){
          tagFound=0;
          tagIndex=0;
          if(input[i]!='/' && input[i]!='!'){
            updateTagCount(tag, tags, tagsCount);
          }
          else{
            free(tag);
          }
        }
        else{
          //update the tag name in local heap allocated variable
          //and copy to tags variable
          tag[tagIndex++]=input[i];
        }
      }
    }
  }
}

here is the output: enter image description here

Which is when I execute "./htags HelloWorld.html" this is on linux with the gcc compiler and -std=c99 flag included, there are a few other functions I didn't include, they check for duplicates, free memory and print. Sorry for the lengthy question, any help is appreciated!

Jackson
  • 41
  • 5
  • 2
    `input[j] = fgetc(inputFile);` What if `fgetc` _returns_ `EOF`? [while(!feof) is always wrong](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) – KamilCuk May 31 '20 at 21:47
  • I thought feof() would handle that no? – Jackson May 31 '20 at 21:53
  • 1
    Does this answer your question? [Why is “while ( !feof (file) )” always wrong?](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) – KamilCuk May 31 '20 at 22:00
  • `feof` is detected _after_ you read from end of file. So first `fgetc` returns EOF, then `feof` will return true. Generally, forget `feof` exists. Handle return values. – KamilCuk May 31 '20 at 22:01
  • Ok thank you, been stuck on this for a while, will check EOF before I use getc – Jackson May 31 '20 at 22:05

0 Answers0