1

We were supposed to extract strings from a provided file, the output matches the expect, but it reports segmentation fault in the end and I don't know why.

    #include<stdio.h>
    #include<string.h>

    int main(int argc, char *argv[]){
        char str[100];
        char f;
        int len = 0;
        FILE *file;
        file = fopen(argv[1],"r");
        //read only here, so use "r"
        if(file==NULL){
            printf("The file doesn't exist.\n");
            return 1;
        }
        while(feof(file)==0){
            //if feof returns 0 it means it havent reaches the end yet
            fread(&f,sizeof(f),1,file);//read in the file
            //printabel character between 32 and 126
            if(f>=32&&f<=126){
                str[len] = f;
                len++;
                continue;//keep doing it(for ->while)
            }
            if(strlen(str)>3){
                //a string is a run of at least 4
                printf("The output is:%s\n",str);
                len=0;//reset
                memset(str, 0, sizeof(str));
                //reset the str so it wont get too big(overflow)
            }
        }
        //close the file and return
        fclose(file);
        return 0;
    }
Terry Yen
  • 25
  • 2

2 Answers2

1

This is not true

   while(feof(file)==0){
        //if feof returns 0 it means it havent reaches the end yet

And a very common mistake.

This returns 0 if you have Not read past the end of file. Its s subtle but important detail. Your last read may have read up-to the end of file but not past it. This means there is actually no data left to read but feof() will still return 0.

This is why you must test the result of the read operation.

fread(&f,sizeof(f),1,file);

If this returns zero then you failed to read anything.

Which is why you should structure your loop to test the result of the read (not feof()).

while (fread(&f,sizeof(f),1,file) == 1)
{
     // You have successfully read an object from the stream
}
Martin York
  • 257,169
  • 86
  • 333
  • 562
  • In other languages such as php `feof` does what many expect: tells when `EOF` has been reached. In C, it informs an error condition: attempt to read past the end of a file. – Weather Vane Mar 01 '18 at 20:06
  • @WeatherVane: I think PHP is the odd one out then. Most languages (I have used which is a few but not all) don't mark it till you read past the end of file. – Martin York Mar 01 '18 at 20:23
  • @WeatherVane: Even this page https://www.w3schools.com/php/func_filesystem_feof.asp seems to suggest php uses the same method as C. As otherwise the read would be first not last in the loop. So in PHP they are doing the read and then testing if it worked by calling `feof()`. – Martin York Mar 01 '18 at 20:27
  • I am exploring this and w3schools as you wrote. Will report back. – Weather Vane Mar 01 '18 at 20:39
  • You are right. I did some testing of PHP and found it behaves like C. The [w3schools page about php's `feof`](https://www.w3schools.com/php/func_filesystem_feof.asp) says *"The feof() function checks if the "end-of-file" (EOF) has been reached. This function returns TRUE if an error occurs, or if EOF has been reached. Otherwise it returns FALSE."* So the poor wording only perpetuates the pervasive myth about `feof`. – Weather Vane Mar 01 '18 at 21:19
  • I replaced the feof with fread, it now yield a "Abort trap: 6", yet the output is still correct. But if I change the str[100] to str[105] it won't yield that error, I am kinda confused. – Terry Yen Mar 01 '18 at 22:20
  • @TiruoYan: Note: `strlen()` depends on a string being null terminated. Your code does not always guarantee this. – Martin York Mar 01 '18 at 22:36
1

Your code has some fundamental errors:

  • See Why is while ( !feof (file) ) always wrong?
  • You don't check if fread returns 0, meaning that no more character could be read, yet you continue with your algorithm
  • str is not '\0'-terminated, the strlen(str)>3 yields undefined behaviour in the first iteration and will likely be evaluated as true right in the first iteration. Then the printf would also yield undefined behaviour for the same reason.
  • Don't use the ASCII code directly, it's hard to read, you have to look up in the ASCII table to see what 32 is and what 126. Better use the character constants

    if(f>= ' ' && f <= '~'){
        ...
    }
    

    This is easier to read and you get the intention of the code immediately.

So the program can be rewritten like this:

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]){
    char str[100];
    char f;
    int len = 0;
    FILE *file;
    file = fopen(argv[1],"r");
    //read only here, so use "r"
    if(file==NULL){
        printf("The file doesn't exist.\n");
        return 1;
    }

    memset(str, 0, sizeof str);

    while(fread(&f, sizeof f, 1, file) == 1)
    {
        if(f >= ' ' && f <= '~')
        {
            str[len++] = f;
            continue;
        }

        if(strlen(str) > 3) // or if(len > 3)
        {
            printf("The output is: %s\n", str);
            len = 0;
            memset(str, 0, sizeof str);
        }
    }

    fclose(file);
    return 0;
}
Pablo
  • 13,271
  • 4
  • 39
  • 59
  • In `while(fread(&f, sizeof f, 1, file) != 1)` did you mean to use `== 1`? Or more generally `> 0`? It is also important in the general case to capture the number of elements read. – Weather Vane Mar 01 '18 at 19:56
  • @WeatherVane of course, the `!=` came out of habit. – Pablo Mar 01 '18 at 19:59
  • *It is also important in the general case to capture the number of elements read* yes, but in this case the `fread` is reading 1 block of 1 byte, so the number of elements read is either 1 on success, and 0 on failure. – Pablo Mar 01 '18 at 20:02