Some have mentioned that scanf
is probably unsuitable for this purpose. I wouldn't suggest using fgets
, either. Though it is slightly more suitable, there are problems that seem difficult to avoid, at least at first. Few C programmers manage to use fgets
right the first time without reading the fgets
manual in full. The parts most people manage to neglect entirely are:
- what happens when the line is too large, and
- what happens when
EOF
or an error is encountered.
The fgets()
function shall read bytes from stream
into the array pointed to by s
, until n-1
bytes are read, or a is read and transferred to s
, or an end-of-file condition is encountered. The string is then terminated with a null byte.
Upon successful completion, fgets()
shall return s
. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets()
shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets()
shall return a null pointer...
I don't feel I need to stress the importance of checking the return value too much, so I won't mention it again. Suffice to say, if your program doesn't check the return value your program won't know when EOF
or an error occurs; your program will probably be caught in an infinite loop.
When no '\n'
is present, the remaining bytes of the line are yet to have been read. Thus, fgets
will always parse the line at least once, internally. When you introduce extra logic, to check for a '\n'
, to that, you're parsing the data a second time.
This allows you to realloc
the storage and call fgets
again if you want to dynamically resize the storage, or discard the remainder of the line (warning the user of the truncation is a good idea), perhaps using something like fscanf(file, "%*[^\n]");
.
hugomg mentioned using multiplication in the dynamic resize code to avoid quadratic runtime problems. Along this line, it would be a good idea to avoid parsing the same data over and over each iteration (thus introducing further quadratic runtime problems). This can be achieved by storing the number of bytes you've read (and parsed) somewhere. For example:
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL, *temp;
do {
size_t alloc_size = bytes_read * 2 + 1;
temp = realloc(bytes, alloc_size);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
temp = fgets(bytes + bytes_read, alloc_size - bytes_read, f); /* Parsing data the first time */
bytes_read += strcspn(bytes + bytes_read, "\n"); /* Parsing data the second time */
} while (temp && bytes[bytes_read] != '\n');
bytes[bytes_read] = '\0';
return bytes;
}
Those who do manage to read the manual and come up with something correct (like this) may soon realise the complexity of an fgets
solution is at least twice as poor as the same solution using fgetc
. We can avoid parsing data the second time by using fgetc
, so using fgetc
might seem most appropriate. Alas most C programmers also manage to use fgetc
incorrectly when neglecting the fgetc
manual.
The most important detail is to realise that fgetc
returns an int
, not a char
. It may return typically one of 256 distinct values, between 0
and UCHAR_MAX
(inclusive). It may otherwise return EOF
, meaning there are typically 257 distinct values that fgetc
(or consequently, getchar
) may return. Trying to store those values into a char
or unsigned char
results in loss of information, specifically the error modes. (Of course, this typical value of 257 will change if CHAR_BIT
is greater than 8, and consequently UCHAR_MAX
is greater than 255)
char *get_dynamic_line(FILE *f) {
size_t bytes_read = 0;
char *bytes = NULL;
do {
if ((bytes_read & (bytes_read + 1)) == 0) {
void *temp = realloc(bytes, bytes_read * 2 + 1);
if (temp == NULL) {
free(bytes);
return NULL;
}
bytes = temp;
}
int c = fgetc(f);
bytes[bytes_read] = c >= 0 && c != '\n'
? c
: '\0';
} while (bytes[bytes_read++]);
return bytes;
}