Type casting: double to char: multiple questions

Question

Consider this code:

#include <stdio.h>

int main(void) 
{
    /* TEST 1 */
    double d = 128;
    char ch = (char)d;
    printf("%d\n", ch);

    /* TEST 2 */
    printf("%d\n", (char)128.0);

    /* TEST 3 */
    char ch1 = (char)128.0;
    printf("%d\n", ch1);
    return 0;
}

Results:

        gcc*  clang*  cl*
TEST 1  -128  -128    -128
TEST 2  127   0       -128
TEST 3  127   -2      -128

* latest version

Questions:

Why the results differ between tests (excluding cl)?
Why the results differ between compilers (excluding TEST 1)?
In case of UB/IB, where is the UB/IB exactly? What the standard says?
[Extra question] Why clang shows so different behavior? Where these 0 and -2 come from?

As far as clang goes, since it's UB, clang probably may have optimized this to letting the register/memory location containing that argument to whatever value it had before the call. — , Jul 01 '20 at 17:37
It's interesting that neither gcc nor clang warns about this, even with `-Wall -W` or `-Weverything`. It certainly seems that the compiler "knows" that undefined behavior is being invoked. — Nate Eldredge, Jul 01 '20 at 18:02
@NateEldredge, yes, the fact that gcc nor clang do not produce any warning is quite surprising. Any idea why? — pmor, Jul 01 '20 at 19:20
It seems to be the cast that silences the warning. `char c = 128.0` does give a warning. Maybe, as in other cases, the cast is taken as a signal of "I know what I'm doing". — Nate Eldredge, Jul 01 '20 at 19:24
@NateEldredge, Interesting. Meaning that via explicit type casting the compiler allows to the user run into UB. Why such behavior? Is there are any practical cases why the user may want to explicitly run into UB? Also: cl (with no extra options given) does not generate a warning for `char c = 128.0`. — pmor, Jul 01 '20 at 19:34
Well, usually when a user "explicitly runs into UB", it means that they have some knowledge about what the compiler they're using is actually going to do, and they want it to do that. — Nate Eldredge, Jul 01 '20 at 19:47

chux - Reinstate Monica · Accepted Answer · 2021-01-06T23:24:31.077

4

When CHAR_MAX == 127, (char)128.0 is undefined behavior (UB).

When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined. C17dr § 6.3.1.4 1

This is not UB due to integer overflow. It is UB due to conversion rules.

edited Jan 06 '21 at 23:24

answered Jul 01 '20 at 17:53

chux - Reinstate Monica

143,097
13
135
256

1

Note that overflow on conversion from an integer to a signed integer type doesn't have undefined behavior, but yields an implementation-defined result (or raises an implementation-defined signal). That's **not** being done here, so it's not directly relevant, but it's worth being aware of the difference. – Keith Thompson Jul 01 '20 at 18:11
@KeithThompson, thanks! Can you comment/explain § 6.3.1.4, note 61? Citation: «The remaindering operation performed when a value of integer type is converted to unsigned type need not be performed when a value of real floating type is converted to unsigned type.» – pmor Jul 01 '20 at 19:13
1

@pmor Converting a floating-point value to an unsigned integer type is well defined *if* the value is in range. If it's not in range, the behavior is undefined. signed-to-unsigned conversion involves remaindering; floating-to-unsigned does not. (An implementation *could* do remaindering; that's one of the infinitely many possible ways undefined behavior can manifest.) – Keith Thompson Jul 01 '20 at 21:41
1

@KeithThompson Minor: "... floating-point value to an unsigned integer type is well defined if the _truncated_ value is in range.". – chux - Reinstate Monica Jul 01 '20 at 21:48
@KeithThompson, thanks! Why floating-to-unsigned conversion does not _strictly_ involve remaindering (i.e. "need not be performed")? What is the reason? – pmor Jul 02 '20 at 10:59
@KeithThompson, Is NaN-to-char conversion well defined? – pmor Jul 02 '20 at 11:50
@pmor "When a finite value of real floating type is converted to an integer type other than **`_Bool`**, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined." – Keith Thompson Jul 02 '20 at 18:36
1

I presume that converting some very large floating-point value (say, 1.23e45) to an integer type that can't hold the value wasn't considered useful enough to define the behavior. Requiring some particular algorithm would have imposed a burden even for sensible conversions like `(char)55.0`. Really, how often do you convert floating-point values to `char`? (And remember that the signedness of plain `char` is implementation-defined.) – Keith Thompson Jul 02 '20 at 18:39
1) Yes, the citation from the standard answers the question. Meaning that NaN-to-char is UB. And any -to-char is UB. 2) "how often do you convert floating-point values to char": I'm dealing with some set of tests, which do test the correctness of `float/double`-to-integer conversions. And I've figured out that some of these tests are incorrect (invalid), because they lead to UB. (Hence in each of such tests there is no reference conversion result.) 3) "the signedness of plain `char` is implementation-defined": Wow! Thanks! I didn't know that. – pmor Jul 02 '20 at 18:53
"And any -to-char is UB" --> applies to a truncated FP. values. Other types have different rules. – chux - Reinstate Monica Jul 02 '20 at 18:59
1

@chux-ReinstateMonica Yes, correct. For example: an integral conversion never produces undefined behaviour (it can produce implementation-defined behaviour). Source: https://stackoverflow.com/a/19274544/9881330. – pmor Jan 06 '21 at 22:58

score 1 · Answer 2 · answered Jul 01 '20 at 18:14

AS @chux stated (char)128.0 is an UB. gcc because the triviality of the example detects this UB and instead takes CHAR_MAX as largest closest signed number.

But if you confuse it a bit it will not behave like this (conversion to int is not an UB, and the next conversion UB is not detected by the gcc).

int main(void) 
{
    volatile char x = (char)128.0;
    volatile char y = (char)(int)128.0;

    printf("%d %d\n", x, y);
}

and the code (the interesting part):

        mov     BYTE PTR [rsp+14], 127
        mov     BYTE PTR [rsp+15], -128

https://godbolt.org/z/xG3jUy

BTW this gcc behaviour was discussed long time ago and many people (including me) were opposing it. But gcc developers decided to go this way.

Thanks for the "confuse example"! Interesting. Question: if gcc detects the [1st conversion] UB why it does not produce any warning? — pmor, Jul 01 '20 at 19:25
@pmor Certainly the cast silences the warning. Tip: avoid casts. — chux - Reinstate Monica, Jul 01 '20 at 19:35

Type casting: double to char: multiple questions

2 Answers2

Linked