Let's assume the input is
<LWS>* <first> <LWS>+ <second> <LWS>+ <integer>
where <LWS>
is any whitespace character, including newlines; <first>
has one to seven non-whitespace characters; <second>
has one to five non-wihitespace characters; <integer>
is an optionally signed integer (in hexadecimal if it begins with 0x
or 0X
, in octal if it begins with 0
, or in decimal otherwise); *
indicates zero or more of the preceding element; and +
indicates one or more of the preceding element.
Let's say you have a structure,
struct record {
char first[8]; /* 7 characters + end-of-string '\0' */
char second[6]; /* 5 characters + end-of-string '\0' */
int number;
};
then you can read the next record from stream in
into the structure pointed to by the caller using e.g.
#include <stdlib.h>
#include <stdio.h>
/* Read a record from stream 'in' into *'rec'.
Returns: 0 if success
-1 if invalid parameters
-2 if read error
-3 if non-conforming format
-4 if bug in function
+1 if end of stream (and no data read)
*/
int read_record(FILE *in, struct record *rec)
{
int rc;
/* Invalid parameters? */
if (!in || !rec)
return -1;
/* Try scanning the record. */
rc = fscanf(in, " %7s %5s %d", rec->first, rec->second, &(rec->number));
/* All three fields converted correctly? */
if (rc == 3)
return 0; /* Success! */
/* Only partially converted? */
if (rc > 0)
return -3;
/* Read error? */
if (ferror(in))
return -2;
/* End of input encountered? */
if (feof(in))
return +1;
/* Must be a bug somewhere above. */
return -4;
}
The conversion specifier %7s
converts up to seven non-whitespace characters, and %5s
up to five; the array (or char pointer) must have room for an additional end-of-string nul byte, '\0'
, which the scanf() family of functions add automatically.
If you do not specify the length limit, and use %s
, the input can overrun the specified buffer. This is a common cause for the common buffer overflow bug.
The return value from the scanf() family of functions is the number of successful conversions (possibly 0
), or EOF
if an error occurs. Above, we need three conversions to fully scan a record. If we scan just 1 or 2, we have a partial record. Otherwise, we check if a stream error occurred, by checking ferror()
. (Note that you want to check ferror()
before feof()
, because an error condition may also set feof()
.) If not, we check if the scanning function encountered end-of-stream before anything was converted, using feof()
.
If none of the above cases were met, then the scanning function returned zero or negative without neither ferror()
or feof()
returning true. Because the scanning pattern starts with (whitespace and) a conversion specifier, it should never return zero. The only nonpositive return value from the scanf() family of functions is EOF
, which should cause feof()
to return true. So, if none of the above cases were met, there must be a bug in the code, triggered by some odd corner case in the input.
A program that reads structures from some stream into a dynamically allocated buffer typically implements the following pseudocode:
Set ptr = NULL # Dynamically allocated array
Set num = 0 # Number of entries in array
Set max = 0 # Number of entries allocated for in array
Loop:
If (num >= max):
Calculate new max; num + 1 or larger
Reallocate ptr
If reallocation failed:
Report out of memory
Abort program
End if
End if
rc = read_record(stream, ptr + num)
If rc == 1:
Break out of loop
Else if rc != 0:
Report error (based on rc)
Abort program
End if
End Loop