For once I thought I found a good use for sscanf() but after reading about how it handles integers, it appears not. Having a string that should look like this: 123,456,678
I thought I could safely and concisely parse it with this code:
unsigned int x[3];
if( sscanf( s, "%u,%u,%u", x+0, x+1, x+2 ) == 3 )
…
If conversion fails I'm not really interested in knowing why, nor am I worried about getting incorrect data. If there's something other than numbers in there, scanf()
should surely create a matching error and abort, and it knows I'm looking for an unsigned integer, so anything negative should also be a matching error? Nope.
I got suspicious when I read about the conversion specifier %u: Matches an optionally signed decimal integer. Why would this not be a matching error? What happens if it is signed?
Quoting from ISO/IEC 9899:201x 7.21.6.2 ¶ 10, The fscanf function (emphasis mine):
Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.
It appears to read as if scanf()
treats every integer-looking conversion specifier the same, reads the input as some kind of signed integer of unspecified size, and then writes to the output bypassing all normal conversions.
For example converting any integer (negative or positive) into an unsigned integer of smaller size is well behaved according to normal implicit conversions, but not with scanf()
:
unsigned int x;
x = -1; /* Well defined: (-1) + (UINT_MAX+1) = UINT_MAX */
sscanf( "-1", "%u", &x ); /* Undefined behavior? */
Please tell me I'm wrong and that I have missed some part of the standard. One thing that I can't really find a reference to is this part of the section quoted above: "the input item (…) is converted to a type appropriate to the conversion specifier". If the conversion specifier is %u then anything negative is of course not appropriate, nor is anything that does not fit into an unsigned integer. However, I could not find anything in the standard telling me exactly what an "appropriate type" is.
I found a handful of questions dealing with this directly or indirectly, but not in much detail. The question most similar to mine is C: How to prevent input using scanf from overflowing? but it's framed in a way that's not as specific. A few answers (1, 2) mentions the issue but offer no detail or references.
The goal of my question is to get an answer detailing exactly why this can not be interpreted in any way other than undefined behavior, and preferably some rationale as to why this makes sense - fully knowing that some things in C are inconsistent and you I have to accept it.