2

Do I understand the standard correctly that this program cause UB:

#include <stdio.h>

int main(void)
{
    char a = 'A';
    printf("%c\n", a);
    return 0;
}

When it is executed on a system where sizeof(int)==1 && CHAR_MIN==0?

Because if a is unsigned and has the same size (1) as an int, it will be promoted to an unsigned int [1] (2), and not to an int, since a int can not represent all values of a char. The format specifier "%c" expects an int [2] and using the wrong signedness in printf() causes UB [3].

Relevant quotes from ISO/IEC 9899 for C99

[1] Promotion to int according to C99 6.3.1.1:2:

If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

[2] The format specifier "%c" expects an int argument, C99 7.19.6.1:8 c:

If no l length modifier is present, the int argument is converted to an unsigned char, and the resulting character is written.

[3] Using the wrong type in fprintf() (3), including wrong signedness, causes UB according to C99 7.19.6.1:9:

... If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.

The exception for same type with different signedness is given for the va_arg macro but not for printf() and there is no requirement that printf() uses va_arg (4).

Footnotes: (marked with (n))

  1. This implies INT_MAX==SCHAR_MAX, because char has no padding.

  2. See also this question: Is unsigned char always promoted to int?

  3. The same rules are applied to printf(), see C99 7.19.6.3:2

  4. See also this question: Does printf("%x",1) invoke undefined behavior?

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • @einpoklum Without promotion, it would be UB to give a `char` for `"%c"` when `sizeof(int)>1` (the vast majority of systems), since `%c` expects a `int`. There is no `hh` length modifier for the format specifier `%c`, so you have to cast it to `int` without promotion. – 12431234123412341234123 Sep 15 '20 at 19:28
  • @12431234123412341234123 : Without promotion, `printf()` would have been written so that `"%c"` expects a `char`. – einpoklum Sep 15 '20 at 19:29
  • @12431234123412341234123 you're right. I forgot about that possibility. – Gerhardh Sep 15 '20 at 19:33

3 Answers3

3

A program can have undefined behavior or not depending on the characteristics of the implementation.

For example, a program that executes

int x = 32767;
x++;

(and is otherwise well defined) has well defined behavior on an implementation with INT_MAX > 32767, and undefined behavior otherwise.

Your program:

#include <stdio.h>

int main(void)
{
  char a='A';
  printf("%c\n",a);
  return 0;
}

has well defined behavior for any hosted implementation with INT_MAX >= CHAR_MAX. On any such implementation, the value of 'A' is promoted to int, which is what %c expects.

If INT_MAX < CHAR_MAX (which implies that plain char is unsigned and that CHAR_BIT >= 16), the value of a is promoted to unsigned int. N1570 7.21.6.1p9:

If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.

implies that this has undefined behavior.

In practice, (a) such implementations are rare, likely nonexistent (the only existing C implementations I've heard of with CHAR_BIT > 8 are for DSPs and such implementations are likely to be freestanding), and (b) any such implementation would probably be designed to handle such cases gracefully.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • Additionally, on the only implementations I've used where `char` was 16 bits, it was signed. I wonder if there would have been any difficulty specifying that `char` may only be unsigned all possible bit patterns represent distinct valid values of `char` (meaning that while `signed char` could have trap or negative-zero representations, `char` could not), and all such values fit within the range of `int`? – supercat Sep 16 '20 at 19:54
  • @supercat What C implementation with 16-bit `char` have you used? – Keith Thompson Sep 16 '20 at 22:21
  • I've written an embedded application that included a TCP/IP stack and flash file system on the TMS 32050 family. – supercat Sep 16 '20 at 22:32
  • The TMS 32050 is a DSP. – Keith Thompson Sep 16 '20 at 23:57
  • Indeed so. Add the right I/O, however, and it can control an embedded system by itself. – supercat Sep 17 '20 at 01:29
  • The first version of the TMS-based device used an 8051 clone as the "primary" CPU, but the next two versions shifted more of the work from the 11MHz (mostly 12 or 24 cycles per instruction) microcontroller to the 25MHz (mostly 1-2 cycles per instruction) DSP. – supercat Sep 17 '20 at 02:40
  • My point was not to contradict your point that such implementations are DSPs, but rather to augment what you said with the fact that, likely without exception, the only non-contrived implementations where `char` is only unsigned are those where it is smaller than `int`, meaning that `CHAR_MAX` would never be greater than `INT_MAX` on non-contrived implementations, no matter how obscure. – supercat Sep 17 '20 at 17:38
  • @supercat Similarly, I didn't mean to imply that you had implied that the TSM 32050 is not a DSP. You hadn't said whether it was or not, so I clarified. Given my zero experience with DSPs, I wouldn't speculate, based on one example, whether implementations for other DSPs might make plain `char` unsigned. – Keith Thompson Sep 17 '20 at 18:13
  • I would expect that DSPs where `char` is smaller than `int` might make `char` unsigned, but an implementation whose `char` type which doesn't promote to `int` or can't distinctly represent all possible bit patterns would break a lot of code which should be considered "portable". – supercat Sep 17 '20 at 18:52
  • @supercat: You may well be right, but I have no such expectation. No point in speculating without data from other DSP C implementations. – Keith Thompson Sep 17 '20 at 20:46
  • No point in jumping through hoops to avoid compatibility problems with platforms that might conceivably exist, but upon which one's code would never be executed in any case. If program X is superior to program Y in any way, and the only way in which it is inferior is that it would not work correctly on such obscure platforms, I would regard program X as superior to Y absent some *particular* basis for expecting that someone might want to use it on such platforms. – supercat Sep 17 '20 at 21:51
2

TL;DR there is no UB (in my interpretation at any rate).

6.2.5 types
6. For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.
9. The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same 41)
41) The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.

Furthermore

7.16.1.1 The va_arg macro
2 The va_arg macro expands to an expression that has the specified type and the value of the next argument in the call. [...] If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:

  • one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types;

7.21.6.8 The vfprintf function
288) [...] functions vfprintf, vfscanf, vprintf, vscanf, vsnprintf, vsprintf, and vsscanf invoke the va_arg macro [...]

Thus, it stands to reason that an unsigned type is not "an incorrect type for the corresponding (signed) conversion specification", as long as the value is within the range.

This is corroborated by the fact that major compilers do not warn about signed/unsigned format specification mismatch, even though they do warn about other mismatches, even when the corresponding types have the same representation on a given platform (e.g. long and long long).

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • Great answer, i did not know about footnote 41 (or 31 in C99). Can you post the same answer here: https://stackoverflow.com/questions/4664100/does-printfx-1-invoke-undefined-behavior?rq=1 ? since it applies there too. But i only think the first part is relevant, since we talk about `printf()` that uses `...` and not `vprintf()` with `va_list`. – 12431234123412341234123 Sep 16 '20 at 10:52
  • @12431234123412341234123 printf, fprintf and vfprintf are all specified to be equivalent modulo obvious argument substitutions. – n. m. could be an AI Sep 16 '20 at 12:12
  • @12431234123412341234123 If two questions have the same answer, then one is a duplicate of the other and should be closed as such. – n. m. could be an AI Sep 16 '20 at 12:22
  • Do you think my question is a duplicate of this question https://stackoverflow.com/questions/4664100/does-printfx-1-invoke-undefined-behavior?rq=1 ? If not, do you think your answer does not apply to the other question ? I think the argument "If two questions have the same answer, then one is a duplicate" is not true, since the same answer can answer 2 different questions at once. But if you do not want to post the same answer, i will do it. – 12431234123412341234123 Sep 16 '20 at 12:34
  • @12431234123412341234123 I believe that copying answers to different questions is wrong. Information should be linked to and not copied. Closing one of the question as a duplicate is one way to link to the correct answer. I am entirely unsure which of the two should be closed though. – n. m. could be an AI Sep 16 '20 at 14:34
  • My question is much newer, i even made a link to the other one in my question. You also copied the text from the C11 standard and did not only link it. – 12431234123412341234123 Sep 16 '20 at 14:57
  • Even on platforms where `long` and `long long` would have the same representation, neither clang and gcc should be expected to behave meaningfully if a `long long*` is dereferenced to access a value of type `long` or vice versa. Since `va_arg` would presumably use a pointer of the fetched-argument type to retrieve values, I don't think those implementations should be relied upon to handle code that uses a `long long` format specifier to process a `long` argument or vice versa unless or until they document that they extend the language in that fashion. – supercat Sep 16 '20 at 19:58
0

Do i understand the standard correct that this program cause UB:

#include <stdio.h>

int main(void)
{
  char a='A';
  printf("%c\n",a);
  return 0;
}

When it is executed on a system where sizeof(int)==1 && CHAR_MIN==0?

That would be a plausible interpretation of the standard. However, in the event that an implementation with such a combination of type characteristics were produced for genuine use, I have full confidence that it would provide appropriate support for the %c directive -- as an extension, if one wants to interpret it that way. The example program would then have well-defined behavior with respect to that implementation, whether or not the C standard is interpreted to define that behavior, too. I suppose I account that quality-of-implementation issue as being rolled up in "for genuine use".

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • I think that example would only cause `char`-related problems on a platform where the character 'A' was represented using a value greater than INT_MAX. While one could contrive an implementation where that was the case, there's no realistic reason to worry about such things, given that given any source text, one could contrive a "conforming C implementation" that would process it nonsensically. – supercat Sep 17 '20 at 21:59
  • Thanks, @supercat. I hope that "there's no realistic reason to worry", is what this answer already conveys. Certainly that's what I intended it to convey. – John Bollinger Sep 17 '20 at 23:01