24

I have problems with my C program when I try to read / parse input.

Help?


This is a FAQ entry.

StackOverflow has many questions related to reading input in C, with answers usually focussed on the specific problem of that particular user without really painting the whole picture.

This is an attempt to cover a number of common mistakes comprehensively, so this specific family of questions can be answered simply by marking them as duplicates of this one:

  • Why does the last line print twice?
  • Why does my scanf("%d", ...) / scanf("%c", ...) fail?
  • Why does gets() crash?
  • ...

The answer is marked as community wiki. Feel free to improve and (cautiously) extend.

DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • 4
    Looks like a tutorial to me. Good idea, but imo not on SO. – too honest for this site Feb 03 '16 at 13:46
  • 2
    @Olaf: Born from me giving basically the same answers, with only slight variations, *many* times over the years because there really wasn't a good "cover it all" answer to use for close-as-duplicate. [This one](http://stackoverflow.com/a/35172465/60281), and chux' reaction to it, finally made me write one, in the style of the [c++-faq](http://stackoverflow.com/questions/tagged/c%2b%2b-faq) entries. – DevSolar Feb 03 '16 at 13:52
  • 3
    Sorry, but while I acknowledge your effort and fully understand your being tired about getting the same sh** every day, the subject is too broad. Where e.g. do you set the threshold for using a proper lexxer/parser approach? Or single char read vs. `strtoXXX`, etc. As you write: every question has slight variations. For duplicate questions, there is already a procedure established. Most answer for such are more compact. Note that one major problem askers have is abstraction and adapting general concepts to their problems. That will not change with a generic tutorial(not FAQ) like this. – too honest for this site Feb 03 '16 at 14:02
  • @Olaf: In the end, it's up to the community to decide. Where I draw the line is pretty much what I covered below -- feof(), gets(), fgets() / strtol() instead of scanf(). I wouldn't go into multibyte / wide input, lexxing / parsing etc.. As this apparently was not clear, I added those particular questions above, and re-labelled the answer as "*Beginner's* C Input Primer". At this point, I bow out and won't argue pro / con this approach either way -- if the community says it should go, I won't hold a grudge. – DevSolar Feb 03 '16 at 14:07
  • 5
    If this is an attempt to create a "canonical duplicate" which we can use to close down newbie FAQs regarding scanf and left-over new line characters in stdin, then I fully support it. – Lundin Feb 03 '16 at 14:13
  • @DevSolar Note that the c-faq tag is apparently used for questions regarding the classic comp.lang.c FAQ. And not like the C++-faq tag which is used as a SO on-site FAQ. We could of course change the definition of the tag, since it is barely used. Should we create a similar C FAQ? We could open up a discussion about this on meta. – Lundin Feb 03 '16 at 14:16
  • @rykker: No, this is not a movement toward writing tutorials, at least not intended as such. Lundin got it right: This is meant as a "canonical duplicate" for a certain type of C I/O questions that *keeps* popping up and didn't have a good link-to answer yet. You cannot use any of your links for close-as-duplicate, as they are off-site and too broad to actually be of help. I'm trimming down the answer below as we speak, to be more focussed on addressing the couple of issues I meant it for. – DevSolar Feb 03 '16 at 14:26
  • @ryyker The main problem with all those tutorials are that they tend to be of questionable quality. The power of SO is that we can make a FAQ that isn't the work of one (would-be) guru, but hundreds of them. It is not nearly as easy to go and make blatantly incorrect statements on SO, because your post will hopefully get reviewed by multiple experts. Now if you collect every FAQ on SO (there are lots of excellent ones already, below "frequent" questions), you could probably put together a complete Q&A addressing all possible beginner issues. – Lundin Feb 03 '16 at 14:31
  • 1
    Anyway, this comment field is not the place to have this discussion, it belongs on meta. – Lundin Feb 03 '16 at 15:27
  • 2
    I would definitely say this kind of "question" is too broad here... – Marco Bonelli Feb 04 '16 at 03:00
  • Hardly any professional programs use stdin. It's the mainstay of homework exercises, and it is hard to use robustly. – Malcolm McLean Jun 27 '17 at 09:14
  • @MalcolmMcLean: That doesn't keep users from asking about it. Note that the answer handles the more general case. – DevSolar Jun 27 '17 at 09:16
  • I believe that this question/answer may be useful to link to, but I think it is much too broad to be used as a duplicate target. In my opinion, there should be at least 5 different duplicate targets describing the individual issues. – Andreas Wenzel Aug 10 '22 at 21:14
  • @AndreasWenzel By all means, feel free. I did not write this for the upvotes, but the usefulness as duplicate reference. Perhaps link those individual answers from a short summary? – DevSolar Aug 11 '22 at 06:50

1 Answers1

41

The Beginner's C Input Primer

  • Text mode vs. Binary mode
  • Check fopen() for failure
  • Pitfalls
    • Check any functions you call for success
    • EOF, or "why does the last line print twice"
    • Do not use gets(), ever
    • Do not use fflush() on stdin or any other stream open for reading, ever
    • Do not use *scanf() for potentially malformed input
    • When *scanf() does not work as expected
  • Read, then parse
    • Read (part of) a line of input via fgets()
    • Parse the line in-memory
  • Clean Up

Text mode vs. Binary mode

A "binary mode" stream is read in exactly as it has been written. However, there might (or might not) be an implementation-defined number of null characters ('\0') appended at the end of the stream.

A "text mode" stream may do a number of transformations, including (but not limited to):

  • removal of spaces immediately before a line-end;
  • changing newlines ('\n') to something else on output (e.g. "\r\n" on Windows) and back to '\n' on input;
  • adding, altering, or deleting characters that are neither printing characters (isprint(c) is true), horizontal tabs, or new-lines.

It should be obvious that text and binary mode do not mix. Open text files in text mode, and binary files in binary mode.

Check fopen() for failure

The attempt to open a file may fail for various reasons -- lack of permissions, or file not found being the most common ones. In this case, fopen() will return a NULL pointer. Always check whether fopen returned a NULL pointer, before attempting to read or write to the file.

When fopen fails, it usually sets the global errno variable to indicate why it failed. (This is technically not a requirement of the C language, but both POSIX and Windows guarantee to do it.) errno is a code number which can be compared against constants in errno.h, but in simple programs, usually all you need to do is turn it into an error message and print that, using perror() or strerror(). The error message should also include the filename you passed to fopen; if you don't do that, you will be very confused when the problem is that the filename isn't what you thought it was.

#include <stdio.h>
#include <string.h>
#include <errno.h>

int main(int argc, char **argv)
{
    if (argc < 2) {
        fprintf(stderr, "usage: %s file\n", argv[0]);
        return 1;
    }

    FILE *fp = fopen(argv[1], "r");
    if (!fp) {
        // alternatively, just `perror(argv[1])`
        fprintf(stderr, "cannot open %s: %s\n", argv[1], strerror(errno));
        return 1;
    }

    // read from fp here

    fclose(fp);
    return 0;
}

Pitfalls

Check any functions you call for success

This should be obvious. But do check the documentation of any function you call for their return value and error handling, and check for those conditions.

These are errors that are easy when you catch the condition early, but lead to lots of head-scratching if you do not.

EOF, or "why does the last line print twice"

The function feof() returns true if EOF has been reached. A misunderstanding of what "reaching" EOF actually means makes many beginners write something like this:

// BROKEN CODE
while (!feof(fp)) {
    fgets(buffer, BUFFER_SIZE, fp);
    printf("%s", buffer);
}

This makes the last line of the input print twice, because when the last line is read (up to the final newline, the last character in the input stream), EOF is not set.

EOF only gets set when you attempt to read past the last character!

So the code above loops once more, fgets() fails to read another line, sets EOF and leaves the contents of buffer untouched, which then gets printed again.

Instead, check whether fgets failed directly:

// GOOD CODE
while (fgets(buffer, BUFFER_SIZE, fp)) {
    printf("%s", buffer);
}

Do not use gets(), ever

There is no way to use this function safely. Because of this, it has been removed from the language with the advent of C11.

Do not use fflush() on stdin or any other stream open for reading, ever

Many people expect fflush(stdin) to discard user input that has not yet been read. It does not do that. In plain ISO C, calling fflush() on an input stream has undefined behaviour. It does have well-defined behavior in POSIX and in MSVC, but neither of those make it discard user input that has not yet been read.

Usually, the right way to clear pending input is read and discard characters up to and including a newline, but not beyond:

int c;
do c = getchar(); while (c != EOF && c != '\n');

Do not use *scanf() for potentially malformed input

Many tutorials teach you to use *scanf() for reading any kind of input, because it is so versatile.

But the purpose of *scanf() is really to read bulk data that can be somewhat relied upon being in a predefined format. (Such as being written by another program.)

Even then *scanf() can trip the unobservant:

  • Using a format string that in some way can be influenced by the user is a gaping security hole.
  • If the input does not match the expected format, *scanf() immediately stops parsing, leaving any remaining arguments uninitialized.
  • It will tell you how many assignments it has successfully done -- which is why you should check its return code (see above) -- but not where exactly it stopped parsing the input, making graceful error recovery difficult.
  • It skips any leading whitespaces in the input, except when it does not ([, c, and n conversions). (See next paragraph.)
  • It has somewhat peculiar behaviour in some corner cases.

When *scanf() does not work as expected

A frequent problem with *scanf() is when there is an unread whitespace (' ', '\n', ...) in the input stream that the user did not account for.

Reading a number ("%d" et al.), or a string ("%s"), stops at any whitespace. And while most *scanf() conversion specifiers skip leading whitespace in the input, [, c and n do not. So the newline is still the first pending input character, making either %c and %[ fail to match.

You can skip over the newline in the input, by explicitly reading it e.g. via fgetc(), or by adding a whitespace to your *scanf() format string. (A single whitespace in the format string matches any number of whitespace in the input.)

Read, then parse

We just adviced against using *scanf() except when you really, positively, know what you are doing. So, what to use as a replacement?

Instead of reading and parsing the input in one go, as *scanf() attempts to do, separate the steps.

Read (part of) a line of input via fgets()

fgets() has a parameter for limiting its input to at most that many bytes, avoiding overflow of your buffer. If the input line did fit into your buffer completely, the last character in your buffer will be the newline ('\n'). If it did not all fit, you are looking at a partially-read line.

Parse the line in-memory

Especially useful for in-memory parsing are the strtol() and strtod() function families, which provide similar functionality to the *scanf() conversion specifiers d, i, u, o, x, a, e, f, and g.

But they also tell you exactly where they stopped parsing, and have meaningful handling of numbers too large for the target type.

Beyond those, C offers a wide range of string processing functions. Since you have the input in memory, and always know exactly how far you have parsed it already, you can walk back as many times you like trying to make sense of the input.

And if all else fails, you have the whole line available to print a helpful error message for the user.

Clean Up

Make sure you explicitly close any stream you have (successfully) opened. This flushes any as-yet unwritten buffers, and avoids resource leaks.

fclose(fp);
DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • A file open example in text mode rather than binary mode, for the purpose of a beginner's guide, would be more applicable. `fopen(argv[1], "rb");` --> `fopen(argv[1], "r");` – chux - Reinstate Monica May 10 '18 at 12:35
  • 2
    This would be a good answer, but you didn't mention any of the `fgetc` or `fgets`-related pitfalls, and your "GOOD CODE" is broken in a severe way because of that. – autistic Jul 17 '18 at 22:08