How does scanf know if it should scan a new value?

Question

I'm studying about how scanf works.

After scanned other type variable, char variable stores a white-space('\n') by getchar() or scanf("%c"). To prevent this, they should clear buffer. And I did it with rewind(stdin)

though stdin is rewinded previous input value is keeping in buffer. and I can do something with the previous value normally.(nothing runtime errors) but if I try scanf again, scanf will scan a new value even there is a normal value in buffer. how does scanf determine if it should scan a new value?

I found this mechanism with below code.

#include <stdio.h>
#define p stdin

int main() {
    int x;
    char ch;

    void* A, * B, * C, * D, * E;

    A = p->_Placeholder;
    printf("A : %p\n", A);//first time, it shows 0000
    scanf_s("%d", &x);

    B = p->_Placeholder;
    printf("B : %p\n", B);//after scanned something, I think it's begin point of buffer which is assigned for this process
    rewind(stdin);//rewind _Placeholder 

    C = p->_Placeholder;
    printf("C : %p\n", C);//it outputs the same value as B - length of x

    D = p->_Placeholder;
    printf("D : %c\n", ((char*)D)[0]);//the previous input value is printed successfully without runtime error. it means buffer is not be cleared by scanf
    scanf_s("%c", &ch, 1);//BUT scanf knows the _Placeholder is not pointing new input value, so it will scan a new value from console. How??

    E = p->_Placeholder;
    printf("E : %p\n", E);
    printf("ch : %c\n", ch);
}

You're right that whitespace handling is a huge issue in understanding and using `scanf`, but there's a fundamental problem for you: `rewind()` does *not* clear whitespace. — Steve Summit, Aug 27 '21 at 11:23
What system are you using? Linux? Windows? Where does the assumption that the value inside `stdin->_Placeholder` means anything come from? — KamilCuk, Aug 27 '21 at 11:25
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Aug 27 '21 at 11:26
`_Placeholder` is an internal variable unique to your system's implementation of ``. We don't know what it does or what it means, and we're unlikely to learn anything useful — we're unlikely to learn anything *accurate* — by studying it. — Steve Summit, Aug 27 '21 at 11:26
@SteveSummit Did you mean that the condition on how scanf decides that it should scan for new values is not white-space? So what is the condition? I want to know that! — tobeprogrammers, Aug 27 '21 at 11:29
@KamilCuk I'm using vs2019 on windows, and I found info about `_Placeholder` in ctrl+clicking type keyword `FILE`. I just thought _Placeholder is a `file position indicator`. isn't it? — tobeprogrammers, Aug 27 '21 at 11:31
I recommend that you never use scanf. Use fgets or fread and sscanf instead. Don't read and interpret what you read at the same time. Read first and then interpret what you read. — GoWiser, Aug 27 '21 at 11:34
Identifiers that start with an underscore and a capital letter are **reserved for the implementation**. Any use of such identifier in a C program is **undefined behaviour**. Hence we cannot tell you what your program is doing or why. It is legally permitted to do absolutely anything. — n. m. could be an AI, Aug 27 '21 at 11:36
Thanks for good answers cool guys! (I can't notify two users T_T) — tobeprogrammers, Aug 27 '21 at 11:37
@n.1.8e9-where's-my-sharem. *Identifiers that start with an underscore and a capital letter are **reserved for the implementation.*** The `_Placeholder` variable in question happens to be part of the **implementation's** `stdio` structure.... — Andrew Henle, Aug 27 '21 at 11:49
@n.1.8e9-where's-my-sharem. [Per C11 **7.21 Input/output ** paragraph 3](https://port70.net/~nsz/c/c11/n1570.html#7.21.1p3), the `stdin` identifier is a macro "[expression] of type ''pointer to `FILE`'' that point[s] to the `FILE` objects associated", making the `_Placeholder` identifier in this question **"part of the implementation"**. Your claim that the posted code is undefined behavior is wrong because the implementation is allowed to use such identifiers. — Andrew Henle, Aug 27 '21 at 17:34
@AndrewHenle the **imnplementation** is allowed to use whatever it wants, but a C program isn't. — n. m. could be an AI, Aug 27 '21 at 17:49
@n.1.8e9-where's-my-sharem. Please post where the user's code is creating a reserved identifier. The user is merely ***using*** the ***implementation's*** `_Placeholder` identifier. Are you ***seriously*** trying to claim code that uses [any standard identifier](https://port70.net/~nsz/c/c11/n1570.html#6.10.8) such as `__FILE__` invokes undefined behavior?!?! — Andrew Henle, Aug 27 '21 at 18:28
@AndrewHenle I'm sorry the reason of UB is different. UB is "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements". The construct is definitely nonportable since the standard says nothing about `_Placeholder`. If it were declared by the program, there would be UB because it is reserved. `__FILE__` does not cause UB because the standard says what it is and how to use it (but you still cannot declare it, because it is reserved). — n. m. could be an AI, Aug 27 '21 at 18:46
@n.1.8e9-where's-my-sharem. Are you ***seriously*** saying that referencing a legally-named identifier invokes UB?!? `_Placeholder` is a ***legal*** and ***portable*** identifier that just happens to be reserved - so undefined behavior is invoked only if user code ***creates*** such an identifier. "nonportable or erroneous program construct" means using POSIX `pthread_mutex_lock()` on Windows, not "This identifier that Microsoft uses internally causes UB if you reference it". — Andrew Henle, Aug 27 '21 at 19:40
And ***YES IT IS PORTABLE***. I can write code on Linux that creates a `_Placeholder` identifier and the only reason it's UB is because my code ***created*** a reserved identifier. — Andrew Henle, Aug 27 '21 at 19:41
@AndrewHenle Yes I am seriously saying this. Do what you want with it. — n. m. could be an AI, Aug 27 '21 at 21:47
@n.1.8e9-where's-my-sharem. Wow. I bet everyone who [uses `pathconf()` to get things like `_POSIX_PATH_MAX`](https://man7.org/linux/man-pages/man3/pathconf.3.html) needs to know they're invoking undefined behavior! My God, you should take it upon yourself to warn them all! — Andrew Henle, Aug 27 '21 at 23:07

Steve Summit · Answer 1 · 2022-04-11T13:44:27.813

You have at least three misunderstandings:

"char variable stores a white-space"
rewind(stdin) clears the buffer
_Placeholder tells you something interesting about how scanf handles whitespace

But, I'm sorry, none of these are true.

Let's review how scanf actually handles whitespace. We start with two important pieces of background information:

The newline character, \n, is in most respects an ordinary whitespace character. It occupies space in the input buffer just like any other character. It arrives in the input buffer when you press the Enter key.
When it's done parsing a %-directive, scanf always leaves unparsed input on the input stream.

Suppose you write

int a, b;
scanf("%d%d", &a, &b);

Suppose you run that code and type, as input

12 34

and then hit the Enter key. What happens?

First, the input stream (stdin) now contains six characters:

"12 34\n"

scanf first processes the first of the two %d directives you gave it. It scans the characters 1 and 2, converting them to the integer 12 and storing it in the variable a. It stops reading at the first non-digit character it sees, which is the space character between 2 and 3. The input stream is now

" 34\n"

Notice that the space character is still on the input stream.

scanf next processes the second %d directive. It doesn't immediately find a digit character, because the space character is still there. But that's okay, because like most (but not quite all) scanf format directives, %d has a secret extra power: it automatically skips whitespace characters before reading and converting an integer. So the second %d reads and discards the space character, then reads the characters 3 and 4 and converts them to the integer 34, which it stores in the variable b.

Now scanf is done. The input stream is left containing just the newline:

"\n"

Next, let's look at a slightly different — although, as we'll see, actually very similar — example. Suppose you write

int x, y;
scanf("%d", &x);
scanf("%d", &y);

Suppose you run that code and type, as input

56
78

(where that's on two lines, meaning that you hit Enter twice). What happens now?

In this case, the input stream will end up containing these six characters:

"56\n78\n"

The first scanf call has a %d directive to process. It scans the characters 5 and 6, converting them to the integer 56 and storing it in the variable x. It stops reading at the first non-digit character it sees, which is the newline after the 6. The input stream is now

"\n78\n"

Notice that the newline character (both newline characters) are still on the input stream.

Now the second scanf call runs. It, too, has a %d directive to process. The first character on the input stream is not a digit: it's a newline. But that's okay, because %d knows how to skip whitespace. So it reads and discards the newline character, then reads the characters 7 and 8 and converts them to the integer 78, which it stores in the variable y.

Now the second scanf is done. The input stream is left containing just the newline:

"\n"

This may all have made sense, may have seemed unsurprising, may have left you feeling, "Okay, so what's the big deal?" The big deal is this: In both examples, the input was left containing that one, last newline character.

Suppose, later in your program, you have some other input to read. We now come to a hugely significant decision point:

If the next input call is another call to scanf, and if it involves one of the (many) format specifiers that has the secret extra power of also skipping whitespace, that format specifier will skip the newline, then do its job of scanning and converting whatever input comes after the newline, and the program will work as you expect.
But if the next input call is not a call to scanf, or if it's a call to scanf that involves one of the few input specifiers that does not have the secret extra power, the newline will not be "skipped", instead it will be read as actual input. If the next input call is getchar, it will read and return the newline character. If the next input call is fgets, it will read and return a blank line. If the next input call is scanf with the %c directive, it will read and return the newline. If the next input call is scanf with the %[^\n] directive, it will read an empty line. (Actually %[^\n] will read nothing in this case, because it leaves the \n still on the input.)

It's in the second case that the "extra" whitespace causes a problem. It's in the second case that you may find yourself wanting to explicitly "flush" or discard the extra whitespace.

But it turns out that the problem of flushing or discarding the extra whitespace left behind by scanf is a remarkably stubborn one. You can't portably do it by calling fflush. You can't portably do it by calling rewind. If you care about correct, portable code, you basically have three choices:

Write your own code to explicitly read and discard "extra" characters (typically, up to and including the next newline).
Don't try to intermix scanf and other calls. Don't call scanf and then, later, try to call getchar or fgets. If you call scanf and then, later, call scanf with one of the directives (such as "%c") that lacks the "secret extra power", insert an extra space before the format specifier to cause whitespace to be skipped. (That is, use " %c" instead of "%c".)
Don't use scanf at all — do all your input in terms of fgets or getchar.

See also What can I use for input conversion instead of scanf?

Addendum: scanf's handling of whitespace can often seem puzzling. If the above explanation isn't sufficient, it may help to look at some actual C code detailing how scanf works inside. (The code I'm going to show obviously isn't the exact code that's behind your system's implementation, but it will be similar.)

When it comes time for scanf to process a %d directive, you might imagine it will do something like this. (Be forewarned: this first piece of code I'm going to show you is incomplete and wrong. It's going to take me three tries to get it right.)

c = getchar();
if(isdigit(c)) {
    int intval;
    intval = c - '0';
    while(isdigit(c = getchar())) {
        intval = 10 * intval + (c - '0');
    }

    *next_pointer_arg = intval;
    n_vals_converted++;
} else {
    /* saw no digit; processing has failed */
    return n_vals_converted;
}

Let's make sure we understand everything that's going on here. We've been told to process a %d directive. We read one character from the input by calling getchar(). If that character is a digit, it's the first of possibly several digits making up an integer. We read characters and, as long as they're digits, we add them to the integer value, intval, we're collecting. The conversion involves subtracting the constant '0', to convert an ASCII character code to a digit value, and successive multiplication by 10. Once we see a character that's not a digit, we're done. We store the converted value into the pointer handed to us by our caller (here schematically but approximately represented by the pointer value next_pointer_arg), and we add one to a variable n_vals_converted keeping count of how many values we've successfully scanned and converted, which will eventually be scanf's return value.

If, on the other hand, we don't even see one digit character, we've failed: we return immediately, and our return value is the number of values we've successfully scanned and converted so far (which may well be 0).

But there is actually a subtle bug here. Suppose the input stream contains

"123x"

This code will successfully scan and convert the digits 1, 2, and 3 to the integer 123, and store this value into *next_pointer_arg. But, it will have read the character x, and after the call to isdigit in the loop while(isdigit(c = getchar())) fails, the x character will have effectively been discarded: it is no longer on the input stream.

The specification for scanf says that it is not supposed to do this. The specification for scanf says that unparsed characters are supposed to be left on the input stream. If the user had actually passed the format specifier "%dx", that would mean that, after reading and parsing an integer, a literal x is expected in the input stream, and scanf is going to have to explicitly read and match that character. So it can't accidentally read and discard the x in the process of parsing a %d directive.

So we need to modify our hypothetical %d code slightly. Whenever we read a character that turns out not to be an integer, we have to literally put it back on the input stream, for somebody else to maybe read later. There's actually a function in <stdio.h> to do this, sort of the opposite of getc, called ungetc. Here is a modified version of the code:

c = getchar();
if(isdigit(c)) {
    int intval;
    intval = c - '0';
    while(isdigit(c = getchar())) {
        intval = 10 * intval + (c - '0');
    }

    ungetc(c, stdin);    /* push non-digit character back onto input stream */

    *next_pointer_arg = intval;
    n_vals_converted++;
} else {
    /* saw no digit; processing has failed */
    ungetc(c, stdin);
    return n_vals_converted;
}

You will notice that I have added two calls to ungetc, in both places in the code where, after calling getchar and then isdigit, the code has just discovered that it has read a character that is not a digit.

It might seem strange to read a character and then change your mind, meaning that you have to "unread" it. It might make more sense to peek at at an upcoming character (to determine whether or not it's a digit) without reading it. Or, having read a character and discovered that it's not a digit, if the next piece of code that's going to process that character is right here in scanf, it might make sense to just keep it in the local variable c, rather than calling ungetc to push it back on the input stream, and then later calling getchar to fetch it from the input stream a second time. But, having called out these other two possibilities, I'm just going to say that, for now, I'm going to plough ahead with the example that uses ungetc.

So far I've shown the code that you might have imagined lay behind scanf's processing of %d. But the code I've shown so far is still significantly incomplete, because it does not show the "secret extra power". It starts looking for digit characters right away; it doesn't do any skipping of leading whitespace.

Here, then, is my third and final sample fragment of %d-processing code:

/* skip leading whitespace */
while(isspace(c = getchar())) {
    /* discard */
}

if(isdigit(c)) {
    int intval;
    intval = c - '0';
    while(isdigit(c = getchar())) {
        intval = 10 * intval + (c - '0');
    }

    ungetc(c, stdin);    /* push non-digit character back onto input stream */

    *next_pointer_arg = intval;
    n_vals_converted++;
} else {
    /* saw no digit; processing has failed */
    ungetc(c, stdin);
    return n_vals_converted;
}

That initial loop reads and discards characters as long as they're whitespace. Its form is very similar to the later loop that reads and processes characters as long as they're digits. The initial loop will read one more character than it seems like it should: when the isspace call fails, that means that it has just read a non whitespace character. But that's okay, because we were just about to read a character to see if it was the first digit.

[Footnotes: This code is still far from perfect. One pretty significant problem is that it doesn't have any checks for an EOF coming along in the middle of its parsing. Another problem is that it doesn't look for - or + before the digits, so it won't handle negative numbers. Yet another, more obscure problem is that, ironically, obvious-looking calls like isdigit(c) are not always correct — strictly speaking they need to be somewhat cumbersomely rendered as isdigit((unsigned char)c).]

If you're still with me, my point in all this is to illustrate these two points in a concrete way:

The reason %d is able to automatically skip leading whitespace is because (a) the specification says it's supposed to and (b) it has explicit code to do so, as my third example illustrates.
The reason scanf always leaves unprocessed input (that is, input that comes after the input it does read and process) on the input stream is because (a) again, the specification says it's supposed to and (b) its code is typically sprinkled with explicit calls to ungetc, or the equivalent, to make sure that every unprocessed character remains on the input, as my second example illustrates.

@SteveSummit: _%d knows how to skip whitespace_ , but why skipped whitespaces doen't available in buffer and we got only last whitespace character ? I mean, how the whitespace before `%d` gets lost ? — Suraj, Aug 27 '21 at 12:02
The [Microsoft version of `rewind`](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/rewind?view=msvc-160) does actually clear the input buffer, but I believe that is platform-specific behavior, which is not covered by the ISO C standard. — Andreas Wenzel, Aug 27 '21 at 12:06
What do you mean "how", @suraj? As Steve explained, it is part of `scanf`'s behavior when processing a `%d` directive to skip (meaning read and discard) all leading whitespace, if any. This is totally standard and very well documented. `scanf` does it, but beyond that, the details are unspecified. — John Bollinger, Aug 27 '21 at 12:09
@JohnBollinger: _but beyond that, the details are unspecified._ Ok Got it. Thanks. — Suraj, Aug 27 '21 at 12:13
@Suraj *`%d` knows how to skip whitespace* Because it's defined (required) to. *why skipped whitespaces doen't available in buffer?* Not sure what you mean. If you typed the whitespace, it's in the buffer. `%d` will consume and discard leading whitespace (before the digits). `%d` does not consume or discard whitespace after the digits. *we got only last whitespace character* Not sure what you mean. *how the whitespace before `%d` gets lost?* Because `scanf` actively reads and discards those characters as part of the task of processing the `%d` directive. — Steve Summit, Aug 27 '21 at 13:09
@SteveSummit Thanks for response. I got it that `%d` directive discards the leading whitespace because it is _the task of processing the `%d` directive._ — Suraj, Aug 27 '21 at 13:52
@SteveSummit Some part of code is confusing like `val` shoud be`intval` and `n_vals_converted++;` shoud also in 2nd `while`. Any way I understand the "addendum" part of your answer after reading it 3-4 times. Thanks a lot for such an amazing answer which clear the concept of " _How `scanf()` works , how leading whitespace discarded and how trailing character (other than integer) remains in `stdin` ._ " — Suraj, Aug 27 '21 at 16:52
@Suraj Thanks for the tip on `val` vs. `intval` — fixed. Not sure what problem you're seeing with `n_vals_converted`, though. — Steve Summit, Aug 27 '21 at 17:24
"It stops reading at the first non-digit character" is more like "it reads the nun-numeric character, stops reading for more numeric text and puts that non-numeric character back into the stream for later reading. — chux - Reinstate Monica, Aug 27 '21 at 17:31
@chux-ReinstateMonica _nun-numeric character_ yes that's the good idea for better understanding. — Suraj, Aug 27 '21 at 18:15
@SteveSummit Actually I want to write `n_vals_converted` inside the block of 2nd while loop to count the digits scanned inside the loop because `n_vals_converted` counts the values that have successfully scanned and converted, which will return by `scanf()` (in case of using `scanf()`). — Suraj, Aug 27 '21 at 18:24
@Suraj But `scanf`'s return value is the number of *values* converted, not the number of digits. `scanf("%d")` will return EOF, 0, or 1. It returns 1 if the numeric input is `1`, `123`, or `12345`. `scanf("%d%d")` will return EOF, 0, 1, or 2. — Steve Summit, Aug 27 '21 at 18:27

chqrlie · Accepted Answer · 2021-08-27T21:19:05.963

There are some problems with you approach:

you use an undocumented, implementation specific member of the FILE object _Placeholder which may or may not be available on different platforms and whose contents are implementation specific anyway.
you use scanf_s(), which is a Microsoft specific so-called secure version of scanf(): this function is optional and may not be available on all platforms. Furthermore, Microsoft's implementation does not conform to the C Standard: for example the size argument passed after &ch is documented in VS with a type of UINT whereas the C Standard specifies it as a size_t, which on 64-bit versions of Windows has a different size.

scanf() is quite tricky to use: even experienced C programmers get bitten by its many quirks and pitfalls. In your code, you test %d and %c, which behave very differently:

for %d, scanf() will first read and discard any white space characters, such as space, TAB and newlines, then read an optional sign + or -, it then expects to read at least one digit and stop when it gets a byte that is not a digit and leave this byte in the input stream, pushing it back with ungetc() or equivalent. If no digits can be read, the conversion fails and the first non digit character is left pending in the input stream, but the previous bytes are not necessarily pushed back.
processing %c is much simpler: a single byte is read and stored into the char object or the conversion fails if the stream is at end of file.

Processing %c after %d is tricky if the input stream is bound to a terminal as the user will enter a newline after the number expected for %d and this newline will be read immediately for the %c. The program can ignore white space before the byte expected for %c by inserting a space before %c in the format string: res = scanf(" %c", &ch);

To better understand the behavior of scanf(), you should output the return value of each call and the stream current position, obtained via ftell(). It is also more reliable to first set the stream to binary mode for the return value of ftell() to be exactly the number of bytes from the beginning of the file.

Here is a modified version:

#include <stdio.h>

#ifdef _MSC_VER
#include <fcntl.h>
#include <io.h>
#endif

int main() {
    int x, res;
    char ch;
    long A, B, C, D;

#ifdef _MSC_VER
    _setmode(_fileno(stdin), _O_BINARY);
#endif

    A = ftell(stdin);
    printf("A : %ld\n", A);

    x = 0;
    res = scanf_s("%d", &x);

    B = ftell(stdin);
    printf("B : %ld, res=%d, x=%d\n", B, res, x);

    rewind(stdin);
    C = ftell(stdin);
    printf("C : %ld\n", C);

    ch = 0;
    res = scanf_s("%c", &ch, 1);
    D = ftell(stdin);
    printf("D : %ld, res=%d, ch=%d (%c)\n", D, res, ch, ch);

    return 0;
}

John Bode · Answer 3 · 2021-09-02T17:15:30.733

Here's some code that illustrates the behavior of the %d conversion specifier; it may help understand how that aspect of scanf works. This isn't how it's actually implemented anywhere, but it follows the same rules (Updated to handle leading +/- sign, checks for overflow, etc).

#include <stdio.h>
#include <ctype.h>
#include <errno.h>
#include <limits.h>

/**
 * Mimics the behavior of the scanf %d conversion specifier.
 * Skips over leading whitespace, then reads and converts
 * decimal digits up to the next non-digit character.
 *
 * Returns EOF if no non-whitespace characters are
 * seen before EOF.
 *
 * Returns 0 if the first non-whitespace character
 * is not a digit.
 *
 * Returns 1 if at least one decimal digit was
 * read and converted.
 *
 * Stops reading on the first non-digit
 * character, pushes that character back
 * on the input stream.
 *
 * In the event of a signed integer overflow,
 * sets errno to ERANGE.
 */
int scan_to_int( FILE *stream, int *value )
{
  int conv = 0;
  int tmp = 0;
  int c;
  int sign = 0;

  /**
   * Skip over leading whitespace
   */
  while( ( c = fgetc( stream ) ) != EOF && isspace( c ) )
    ; // empty loop

  /**
   * If we see end of file before any non-whitespace characters,
   * return EOF.
   */
  if ( c == EOF )
    return c;

  /**
   * Account for a leading sign character.
   */
  if ( c == '-' || c == '+' )
  {
    sign = c;
    c = fgetc( stream );
  }

  /**
   * As long as we see decimal digits, read and convert them
   * to an integer value.  We store the value to a temporary
   * variable until we're done converting - we don't want
   * to update value unless we know the operation was
   * successful
   */
  while( c != EOF && isdigit( c ) )
  {
    /**
     * Check for overflow.  While setting errno on overflow
     * isn't required by the C language definition, I'm adding
     * it anyway.  
     */
    if ( tmp > INT_MAX / 10 - (c - '0') )
      errno = ERANGE;

    tmp = tmp * 10 + (c - '0');
    conv = 1;
    c = fgetc( stream );
  }

  /**
   * Push the last character read back onto the input
   * stream.
   */
  if ( c != EOF )
    ungetc( c, stream );

  /**
   * If we read a sign character (+ or -) but did not have a
   * successful conversion, then that character was not part
   * of a numeric string and we need to put it back on the
   * input stream in case it's part of a non-numeric input.
   */
  if ( sign && !conv )
    ungetc( sign, stream );

  /**
   * If there was a successful read and conversion,
   * update the output parameter.
   */
  if ( conv )
    *value = tmp * (sign == '-' ? -1 : 1);

  /**
   * Return 1 if the read was successful, 0 if there
   * were no digits in the input. 
   */
  return conv;
}

/**
 * Simple test program - attempts to read 1 integer from
 * standard input and display it.  Display any trailing
 * characters in the input stream up to and including
 * the next newline character.
 */
int main( void )
{
  int val;
  int r;

  errno = 0;

  /**
   * Read the next item from standard input and
   * attempt to convert it to an integer value.
   */
  if ( (r = scan_to_int( stdin, &val )) != 1 )
    printf( "Failed to read input, r = %d\n", r );
  else
    printf( "Read %d%s\n", val, errno == ERANGE ? " (overflow)" : "" );

  /**
   * If we didn't hit EOF, display the remaining
   * contents of the input stream.
   */
  if ( r != EOF )
  {
    fputs( "Remainder of input stream: {", stdout );
    int c;
    do {
      c = fgetc( stdin );
      switch( c )
      {
        case '\a': fputs( "\\a", stdout ); break;
        case '\b': fputs( "\\b", stdout ); break;
        case '\f': fputs( "\\f", stdout ); break;
        case '\n': fputs( "\\n", stdout ); break;
        case '\r': fputs( "\\r", stdout ); break;
        case '\t': fputs( "\\t", stdout ); break;
        default: fputc( c, stdout ); break;
      }
    } while( c != '\n' );

    fputs( "}\n", stdout );
  }

  return 0;
}

Some examples - first, we signal EOF (in my case, by typing Ctrl-D):

$ ./convert 
Failed to read input, r = -1

Next, we pass in a non-numeric string:

$ ./convert 
abcd
Failed to read input, r = 0
Remainder of input stream: {abcd\n}

Since nothing was converted, the remainder of the input stream contains everything we typed (including the newline from hitting Enter).

Next, a numeric string with non-numeric trailing characters:

$ ./convert 
12cd45
Read 12
Remainder of input stream: {cd45\n}

We stopped reading at 'c' - only the leading 12 is read and converted.

Several numeric strings separated by whitespace - only the first string is converted:

$ ./convert 
123 456 789
Read 123
Remainder of input stream: {\t456\t789\n}

And a numeric string with leading whitespace:

$ ./convert 
      12345
Read 12345
Remainder of input stream: {\n}

Handle leading signs:

$ ./convert 
-123abd
Read -123
Remainder of input stream: {abd\n}

$ ./convert 
    +456
Read 456
Remainder of input stream: {\n}

$ ./convert 
-abcd
Failed to read input, r = 0
Remainder of input stream: {-abcd\n}

And, finally, we add an overflow check - note that scanf is not required to check for overflow by the C language standard, but I figured it was a useful thing to do:

$ ./convert 
123456789012345678990012345667890
Read -701837006 (overflow)

Remainder of input stream: {\n} %d, %i, %f, %s, etc., all skip over leading whitespace, since whitespace is not meaningful in those cases except to act as a separator between inputs. %c and %[ do not skip over leading whitespace, because it may be meaningful for those particular conversions (there are times when you want to know whether the character you just read is a space, or a tab, or a newline).

As Steve points out, whitespace handling in C stdio routines is and always has been a thorny problem, and no one solution always works the best, especially since different library routines handle it differently.

Minor: `c != EOF && isdigit( c )` replaceable with `isdigit( c )`. — chux - Reinstate Monica, Aug 27 '21 at 17:33
"Mimics the behavior of the scanf %d" --> except that a leading `sign` character is not allowed. — chux - Reinstate Monica, Aug 27 '21 at 17:36
UV for using good engineering efficiency - not all problems need fixing. — chux - Reinstate Monica, Aug 27 '21 at 22:22

How does scanf know if it should scan a new value?

3 Answers3

Linked