16

Is the following code well defined?

#include <stdio.h>

int ScanFirstOrSecond(const char *s, int *dest) {
    return sscanf(s, "%d%d", dest, dest);
}

int main(void) {
    int x = 4;
    ScanFirstOrSecond("5", &x);
    printf("%d\n", x);  // prints 5

    // Here is the tricky bit
    ScanFirstOrSecond("6 7", &x);
    printf("%d\n", x);  // prints 7
    return 0;
}

In other words, do the ... arguments have an implied restrict to them?

The most applicable C spec I found is

The fscanf function executes each directive of the format in turn. ... C11dr §7.21.6.2 4

chqrlie
  • 131,814
  • 10
  • 121
  • 189
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • 3
    Nothing to do with restrict. It's quite well defined. You're writing a value twice to the same address so you get the latest value. – Guy Sirton Mar 01 '16 at 00:15
  • There was a point a few months ago, working on a UB-detecting implementation of `scanf`, when I had the same question. – Pascal Cuoq Mar 01 '16 at 00:15
  • 1
    @Pascal Cuoq If its a dupe, please advise. I did look for for 15 minutes, yet would not be surprised if it is. – chux - Reinstate Monica Mar 01 '16 at 00:17
  • 2
    @nneonneo Are you sure you understand how `restrict` works? Look for the words “and X is also modified” in C11 6.7.3.1:4. – Pascal Cuoq Mar 01 '16 at 00:19
  • @PascalCuoq: You are correct. Example 10 illustrates explicitly that restricted pointers can point to the same thing if neither is modified. And so I retract my observation. – nneonneo Mar 01 '16 at 00:23
  • @Guy Sirton I agree about the latest value, but is it defined what is latest? does the write due to the second `"%d"` _must_ occur after the write due to the first `"%d"`? Seems reasonable that it would be that way, yet the C spec has surprised me before. the quoted spec seems to imply it. – chux - Reinstate Monica Mar 01 '16 at 00:29
  • @chux you omitted to report the return value from `int ScanFirstOrSecond()` – Weather Vane Mar 01 '16 at 00:30
  • @Weather Vane True, yet given the string literals, reporting those values seemed irrelevant as they are certainly 1 and 2. – chux - Reinstate Monica Mar 01 '16 at 00:31
  • 1
    Consider the equivalent case using `scanf`. If you separate the two entries with `newline`, would you expect them to be reversed? – Weather Vane Mar 01 '16 at 00:35
  • @Weather Vane I'd certainly expect the middle `'\n'` to be consumed after the parsing of the 1st integer. But the consumption of text is not some much the question as does the result of the first `"%d"` _must_ be written before the 2nd? As I see it now, it is either sufficiently implied to be so per the spec posted above, or it is not defined. IOWs: UB. Yet maybe something else is to be considered. – chux - Reinstate Monica Mar 01 '16 at 00:52
  • @chux I would expect the middle `'\n'` to be consumed *before* the parsing of the 2nd integer. After all, if the second format spec were `%c`, that would take the middle `newline`. – Weather Vane Mar 01 '16 at 00:55

3 Answers3

13

The short answer is: Yes, it is defined:

scanf will attempt to convert a sequence of bytes from stdin as an integer written in base 10 with optional initial spaces and an optional sign. If successful, the number will be stored into x. scanf will then perform these steps a second time. The return value can be EOF, 0, 1 or 2, and for the latter 2, the last number converted will have been stored into x.

The long answer is somewhat more subtile:

It seems the C Standard does specify that the values are stored in the order of the format string. Quoting the C11 Standard:

7.21.6.2 The fscanf function

...

4 The fscanf function executes each directive of the format in turn. When all directives have been executed, or if a directive fails (as detailed below), the function returns.

...

7 A directive that is a conversion specification defines a set of matching input sequences, as described below for each specifier. A conversion specification is executed in the following steps:

...

10 Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result.

...

16 The fscanf function returns the value of the macro EOF if an input failure occurs before the first conversion (if any) has completed. Otherwise, the function returns the number of input items assigned, which can be fewer than provided for, or even zero, in the event of an early matching failure.

Nowhere else in this specification are any accesses to the output objects even mentioned.

Yet the wording of the Standard seems to indicate that if 2 pointers point to the same object, the behavior might be unexpected: the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. This phrase is somewhat ambiguous: what does that has not already received a conversion result refer to? the object or the argument? Objects receive conversion results, not the pointer arguments. In your contorted example, the object x has already received a conversion result, so it should not receive another one... But as noted by supercat, this interpretation is overtly restrictive as it would imply that all converted values be stored into the first target object.

So it appears fully specified and well defined, but the wording of the specification could be perfected to remove a potential ambiguity.

Community
  • 1
  • 1
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • That wording is referring to how arguments are iterated as directives are iterated. – autistic Mar 01 '16 at 07:55
  • 2
    The key elements of your answer: **executes each directive of the format in turn** and **result of the conversion is placed in the object pointed to by the first argument ...**. Together with [@Seb](http://stackoverflow.com/questions/35712349/is-scanfdd-x-x-well-defined#comment59110660_35712752) comment that it is the argument order (and not value), point that this is the specification that is needed to conclude the result is well-defined. – chux - Reinstate Monica Mar 01 '16 at 15:05
  • 1
    Grammatically, the phrase "...the first argument following the format argument that has not already received a conversion result" suggests that, for purposes of identifying the "first" argument, arguments are deemed to receive results. Otherwise, all results would be stored to the first argument following the format specifier since *no* argument would ever receive results. If Standard said "first argument...whose target object has not received a result", that would be another matter. – supercat Apr 03 '17 at 17:04
  • @supercat: good point. I updated the answer to restrict the value of this picky interpretation. – chqrlie Apr 05 '17 at 04:48
3

scanf() family functions execute the directions you leave them in the format string strictly in turn. So the first value will get read in, and then the second one, overwriting the first. Nothing UB here.

Magisch
  • 7,312
  • 9
  • 36
  • 52
2

Yes, well defined. It means "read the first token into *dest, then read the second token into *dest again". It's weird but legal. Yes, because sscanf() executes directives in the format string in strict order.

ddbug
  • 1,392
  • 1
  • 11
  • 25