0

I am using fgets to read a file line by line

    char buffer[4096];

    while (fgets(buffer, sizeof(buffer), file__1) != NULL) {
           
           fprintf(file__2, "%s", buffer);
    }

However the man page of fgets() says this under the Examples section

           while (fgets(line, line_max + 1, fp) != NULL) {
               // Verify that a full line has been read ...
               // If not, report an error or prepare to treat the
               // next time through the loop as a read of a
               // continuation of the current line.
               ...
               // Process line ...
               ...
           }

My question is, upon fgets failing how could i "Verify that a full line has been read"?

  • See if the last character of the string is a newline. – Shawn Jul 29 '22 at 00:40
  • 2
    (And don't forget the case where the file doesn't have a newline as its last character, which unfortunately happens on occasion) – Shawn Jul 29 '22 at 00:40
  • 1
    If you can use it, prefer POSIX [`getline(3)`](https://www.man7.org/linux/man-pages/man3/getline.3.html), which doesn't have this issue because it reallocates the line buffer if needed. – Shawn Jul 29 '22 at 00:42
  • 2
    Don't forget that if there is a null byte in the middle of a line (which means the file is not actually a text file because text files don't contain null bytes — assuming a single-byte code set), then you cannot tell whether there was a newline at the end of the line because you do not know how many characters `fgets()` read. The POSIX `getline()` function doesn't have that problem; it reports how many characters it read — and the answer should never be zero. – Jonathan Leffler Jul 29 '22 at 02:00

3 Answers3

2

This wouldn't be a failure on the part of fgets(); the control flow wouldn't go into the loop if fgets failed and returned a null pointer.

The need to handle full-line checking arises if you are reading a line with fgets and the line happens to be longer than the size passed to fgets() in argument 2. In this situation, fgets() is returning the passed buffer and is a "success".

You can check for the problem by checking whether the last character in the string is a newline. You can then either abort or handle it somehow.

So something like this would handle the check:


#include <stdio.h>
#include <string.h>
#define LINE_MAX 5
int main() {
  char line[LINE_MAX + 1];
  while (fgets(line, LINE_MAX + 1, stdin) != NULL) {
    size_t length = strlen(line);
    if (length && line[length - 1] != '\n' && !feof(stdin)) {
      printf("line max (%d) reached\n", LINE_MAX);
      return 1;
    }
  }
}

  • 1
    Really boring that `fgets` returns the passed pointer on success instead of the number of characters read into it... that'd make the subsequent `strlen` unneeded :') – Marco Bonelli Jul 29 '22 at 01:09
  • Note that when `length && line[length - 1] != '\n' && !feof(stdin)` is true, it does not mean this is more to the line. – chux - Reinstate Monica Jul 29 '22 at 05:34
2

how could i "Verify that a full line has been read"?

In cases where fgets returns NULL, it means that either

  • end-of-file was encountered before reading a single character (not even a newline character), or

  • an error occurred on the stream.

Therefore, when fgets returns NULL, you should always assume that a full line has not been read.

However, when fgets does not return NULL, then it means that the function call to fgets was successfull. But this does not necessarily mean that a full line has been read. It is possible that fgets successfully filled the buffer, but that the line was too long to fit in the buffer. Therefore, the easiest way to determine whether a full line was read is to check whether the string returned by fgets contains a newline character, for example by using the function strchr.

Even if a newline character is not found, that does not necessarily mean that a full line has not been read. Although POSIX defines a line to end with a newline character, it is possible that you are reading a text file which does not follow this rule. It is possible that you are reading a text file whose last line does not have a newline character, so that you encounter end-of-file without a newline character immediately beforehand. In that case, it is probably appropriate to also consider that line to be "a full line", even if you did not encounter a newline character before reaching end-of-file.

This scenario of encountering end-of-file without a newline character immediately beforehand is also possible when dealing with user input. For example, on Linux, the user can press CTRL+D to enter end-of-file with the keyboard. (On Microsoft Windows, you can do the same with CTRL+Z, but in constrast to Linux, this will only work at the start of a line.)

For the reasons described above, in cases in which you cannot find a newline character, it is probably appropriate to check the end-of-file indicator of the stream using the function feof, and to ignore the missing newline character when the end-of-file indicator of the stream is set. Only when the end-of-file indicator is not set should an error message be printed that the line is too long to fit in the buffer.

In order to read a text file line-by-line and ensure that a full line was always read, I recommend the following code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

FILE *fp;
char line[512], *p;

int main()
{
    //open file
    fp = fopen( "input.txt", "r" );
    if ( fp == NULL )
    {
        fprintf( stderr, "Error opening file!\n" );
        exit( EXIT_FAILURE );
    }

    //read one line per loop iteration
    for (;;) //infinite loop, equivalent to while(1)
    {
        //attempt to read one line of input
        if ( fgets( line, sizeof line, fp ) == NULL )
        {
            //check for stream error
            if ( ferror(fp) )
            {
                fprintf( stderr, "Stream error!\n" );
                exit( EXIT_FAILURE );
            }

            //we must have encountered end-of-file, so break out
            //of the infinite loop without an error message
            break;
        }

        //attempt to find newline character
        p = strchr( line, '\n' );

        if ( p == NULL )
        {
            //a missing newline character should be ignored on
            //end-of-file
            if ( !feof(fp) )
            {
                fprintf( stderr, "Line too long for buffer!\n" );
                exit( EXIT_FAILURE );
            }
        }
        else
        {
            //remove newline character
            *p = '\0';
        }

        //a full line was read, so print it
        puts( line );
    }

    //cleanup
    fclose( fp );
}
Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
  • Corner case: Last 511 characters of "input.txt" lack a `'\n'`. This code reports `"Line too long for buffer!\n"`, yet all was successfully read. Its that "Whether the last line requires a terminating new-line character is implementation-defined." conundrum. – chux - Reinstate Monica Jul 29 '22 at 04:56
  • @chux: In my code, I am imposing a 510 character limit (not including the newline character and the null character) on all lines. If the last line does not have a newline character, then you are right that it would be possible for me to accept 511 instead of 510 characters for that line, but this would require an additional check. I do not believe that this additional code complexity is warranted just to support an additional character in a corner case. – Andreas Wenzel Jul 29 '22 at 06:01
1

how to verify fgets() read a line and handle errors (?)

Recall how C defines a line from a text file:

A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined.

Easy to handle many cases. Difficult to handle all cases.

// Handles most
char buffer[4096];
while (fgets(buffer, sizeof buffer, file__1)) {
  fprintf(file__2, "%s", buffer);
}
if (feof(file__1)) {
  printf("End-of-file detected\n");
} else if (ferror(file__1)) {
  printf("Input error detected\n");
}

What was not well handled:

  • Line length was sizeof buffer or more.

  • Input contained an embedded null character so fprintf(file__2, "%s", buffer); fails to print the entire line.

  • When last line of file lacks a '\n', some later code may have trouble with that like *strchr(buffer, '\n') = 0; to lop off the potential trailing '\n'.

  • Reading a text file that does not use the local line-ending character(s) may not translate well.

  • Reading a text file that uses wide characters does not work well with fgets().

  • When calling fgets() and ferror() is already true, fgets() may work just fine and return non-NULL. Best to not test for ferror() unless fgets() returned NULL and then test for feof() first.

  • Buffer sizes of 1 or more than INT_MAX.

  • Passed in pathologic size parameters of 0 or negative.

  • Lines as long as 255 or more may violate an environmental limit. See BUFSIZ details.

Additional code can handle some of these. Unfortunately fgets() is just not robust enough.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256