8

I ran into this code:

char str[600];
scanf("%s", &str);

Of course, this emits this warning:

a.c:6:17: warning: format specifies type 'char *' but the argument has type
      'char (*)[600]' [-Wformat]
    scanf("%s", &str);
           ~~   ^~~~~~~

I know that the correct way is to remove the & and type scanf("%s", str) instead. But it does work, so my question is if this could cause any problems. Is it undefined behavior? When I switched str to a pointer instead of an array it (obviously) did not work. But can this cause any problem when using an array?

klutt
  • 30,332
  • 17
  • 55
  • 95

3 Answers3

9

Yes, the code is undefined behaviour. The argument corresponding to %s must have the type char *. This is described in C17 7.21.6.2/12 under the s specifier:

[...] the corresponding argument shall be a pointer to the initial element of a character array large enough to accept the sequence and a terminating null character, which will be added automatically.

which says fairly clearly that the pointer should have pointer-to-character type, and not point to the whole array.

Undefined behaviour means that anything can happen. It might behave as if you omitted the &, or it might format your hard drive.

Given that it is extremely easy to avoid undefined behaviour in this case, I don't really see any reason to engage in arguments about whether it is OK to rely on the behaviour of undefined behaviour in this situation.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 7.19.6.1? Do you mean **7.21.6.1 The fprintf function**? And why doesn't **6.2.7 Compatible type and composite type** "save" the posted code from UB? The address of the array is the address of the first element, which has to be compatible with the type of a pointer to the same type as an element of the array? I tend to agree with your answer that's it UB because 7.21.6.1 says it's UB, but I can see how an argument via compatible types and array decay can be made. And that's way, way, waaay down the language lawyer rabbit hole... – Andrew Henle Nov 25 '18 at 23:18
  • @AndrewHenle Nice catch. I was to quick there. – klutt Nov 25 '18 at 23:30
  • 1
    @AndrewHenle -- "The address of the array is the address of the first element, which has to be compatible with the type of a pointer to the same type as an element of the array?": I am not sure what you are trying to say here, but it sounds like you are saying that a pointer to an array and a pointer to the first element of that array are compatible types. I can't see any way that the Standard supports this, given that [two types have compatible type if their types are the same](https://port70.net/~nsz/c/c11/n1570.html#6.2.7p1) (plus a few rules that don't appear to apply here). – ad absurdum Nov 25 '18 at 23:57
  • The standard says this about `fscanf`: *If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.* – klutt Nov 26 '18 at 00:01
  • @AndrewHenle this is the scanf function,, not the printf function. `char *` and `char (*)[600]` are not compatible types (that term is defined by C17 6.2.7) – M.M Nov 26 '18 at 00:27
  • @Broman that wording is a bit icky, seeing as it does not define what is meant by "appropriate type" and in fact objects do not necessary even have a type (e.g. an object allocated by `malloc`). I don't think it intended to say that using `scanf` to read into malloc'd space is always UB, so I'm sticking with the "common sense" interpretation that it refers to the type of the pointer, not the type of the object – M.M Nov 26 '18 at 00:36
  • @M.M appropriate type is the one type defined for any given conversion; *is converted to a type appropriate to the conversion specifier.* – Antti Haapala -- Слава Україні Nov 26 '18 at 03:16
  • @AnttiHaapala maybe it could be argued that `char[600]` is appropriate type for reading `%s` into – M.M Nov 26 '18 at 03:28
  • @M.M it says *If no l length modifier is present, the corresponding argument shall be a pointer to the initial element of a character array large enough to accept the sequence and a terminating null character, which will be added automatically*, which should be sufficient. – Antti Haapala -- Слава Україні Nov 26 '18 at 03:28
  • @AnttiHaapala Oh yeah, it describes that in more detail under the paragraph for `s` specifier . So we should assume that "appropriate type" is meant to be referring back to what is specified for each specifier. – M.M Nov 26 '18 at 03:30
  • 1
    @AnttiHaapala that phrasing seems like it would allow `unsigned char *` too, since that fits the description "character". – M.M Nov 26 '18 at 03:33
  • This was fishier than I thought. I'm considering putting a bounty on this. – klutt Nov 26 '18 at 03:47
  • @Broman it's actually not fishy as you can see from the comment chain (I edited my answer to include the relevant standard quote now) – M.M Nov 26 '18 at 03:48
2

Using &str instead of str didn't cause any problems in this case because the addresses of those two are the same. See this past question for an explanation. But as you note, the type of &str is different, and the compiler throws up a warning, and the actual behavior will depend on architecture and implementation.

Paul
  • 370
  • 1
  • 6
  • It might not cause problems in common architectures - but strictly speaking, it's not allowed. And one thing that might break it: there's no guarantee that different pointer types all use the same representation for the same address. So here the types in question `char*` and `char (*)[600]` might have different sizes or value representations. – aschepler Nov 25 '18 at 23:11
  • @aschepler Fair enough. I suppose any behavior not defined in the spec is going to be implementation- or architecture-specific. Better to write it the correct way to begin with. – Paul Nov 25 '18 at 23:14
  • @Paul -- "any behavior not defined in the spec": It isn't that the behavior is left undefined, so much as that the Standard says explicitly that the behavior is undefined. Passing the wrong types to functions is not a gray area. – ad absurdum Nov 25 '18 at 23:24
  • Yes, though the difference between a behavior not being defined and being explicitly undefined doesn't seem that different. I'll update my answer to say that it "didn't" cause any problems in this case rather than it "wouldn't" cause any problems. – Paul Nov 25 '18 at 23:32
  • 1
    @Paul -- compiler writers may take advantage of the fact that some constructs lead to explicit undefined behavior (and hence should never show up in valid C programs) to make some optimizations. I would say that this is a significant difference compared with undefined-by-omission behaviors, and certainly with the many implementation-defined behaviors given in the Standard. – ad absurdum Nov 25 '18 at 23:38
-4

In C, the name of an array is also its address (points to the beginning of the array).

klutt
  • 30,332
  • 17
  • 55
  • 95
XsOuLp
  • 15
  • 4
  • This isn't true. Arrays are objects in C (in the C sense), and array identifiers refer to array objects. Arrays do _decay_ to pointers to their first elements in most expressions (but not in all expressions). In particular, with `char arr[] = "abc"; size_t arr_sz = sizeof arr;` the identifier `arr` not only refers to an _array_, it does not decay to a pointer in the `sizeof` expression. – ad absurdum Nov 25 '18 at 23:17
  • you are right. I wanted to say that the "name of an array" can be used as its address, not that an array is (only) the address. – XsOuLp Dec 04 '18 at 18:47
  • @XsOuLp still incorrect. Practical example: `sizeof(arr)` here `arr` is an array type and does not decay. It is **not** an address in any way, shape of form. – bolov Feb 05 '20 at 07:36
  • further reading: https://stackoverflow.com/q/21972465/2805305 – bolov Feb 05 '20 at 07:41