Is Sign Extension in C++ a compiler option, or compiler dependent or target dependent?

Question

The following code has been compiled on 3 different compilers and 3 different processors and gave 2 different results:

typedef unsigned long int u32;
typedef signed long long s64;
int main ()
{ u32 Operand1,Operand2;
  s64 Result;
  Operand1=95;
  Operand2=100;
  Result= (s64)(Operand1-Operand2);}

Result produces 2 results: either -5 or 4294967291

I do understand that the operation of (Operand1-Operand2) is done in as 32-bit unsigned calculation, then when casted to s64 sign extension was done correctly in the first case but not done correctly for the 2nd case.

My question is whether the sign extension is possible to be controlled via compiler options, or it is compiler-dependent or maybe it is target-dependent.

You are making various incorrect assumptions, the most serious being that `unsigned long` is 32 bits, which is typically not true for most modern operating systems. Use `` and proper fixed width types. — Paul R, Nov 03 '16 at 08:12
*" is done in as 32bit unsigned calculation"* In no way guaranteed. Just because you decided to `typedef` something to a name of `u32` does not mean that is true. If you need fixed-width integers, C++11 provides them [via a header file](http://en.cppreference.com/w/cpp/types/integer) — UnholySheep, Nov 03 '16 at 08:14
`Operand1-Operand2` is unsigned therefore when casting to `s64` it's always zero extension. What's probably wrong is your printing function which you didn't show — phuclv, Nov 03 '16 at 08:21
@PaulR You are correct in some mean, but I am sure it is 32 bits because I know the processor and have confirmed it using the debugger. — AYZAB, Nov 03 '16 at 08:37
The program doesn't produce any output when I run it. How did you examine `Result`? (It matters, if that involves a narrowing conversion). — Toby Speight, Nov 03 '16 at 08:37
@LưuVĩnhPhúc: been using the debugger to check the internal values. — AYZAB, Nov 03 '16 at 08:37
When you're confused about such conversions, it can be worth assigning an intermediate to an `auto` variable. Then you can force a compilation error to show you the type deduced for it: `auto n = Operand1-Operand2; printf(n);` — Toby Speight, Nov 03 '16 at 08:40
how did you check it? and you still didn't provide which compiler/target are you using and how you print it — phuclv, Nov 03 '16 at 08:46
`because I know the processor and have confirmed it using the debugger` the debugger only shows values of variables, not sizes unless you print out `sizeof(type)` — phuclv, Nov 03 '16 at 08:51
Yes i used sizeof inside the watch of the debugger and printed it. — AYZAB, Nov 03 '16 at 09:14
use fixed-width types and repeat the test, it causes confusion to have people guess what size of `unsigned long` is on the various platforms. — M.M, Nov 03 '16 at 09:18

Daniel Jour · Accepted Answer · 2016-11-03T08:49:53.220

Your issue is that you assume unsigned long int to be 32 bit wide and signed long long to be 64 bit wide. This assumption is wrong.

We can visualize what's going on by using types that have a guaranteed (by the standard) bit width:

int main() {
    {
        uint32_t large = 100, small = 95;
        int64_t result = (small - large);
        std::cout << "32 and 64 bits: " << result << std::endl;
    }  // 4294967291
    {
        uint32_t large = 100, small = 95;
        int32_t result = (small - large);
        std::cout << "32 and 32 bits: " << result << std::endl;
    }  // -5
    {
        uint64_t large = 100, small = 95;
        int64_t result = (small - large);
        std::cout << "64 and 64 bits: " << result << std::endl;
    }  // -5
    return 0;
}

In every of these three cases, the expression small - large results in a result of unsigned integer type (of according width). This result is calculated using modular arithmetic.

In the first case, because that unsigned result can be stored in the wider signed integer, no conversion of the value is performed.

In the other cases the result cannot be stored in the signed integer. Thus an implementation defined conversion is performed, which usually means interpreting the bit pattern of the unsigned value as signed value. Because the result is "large", the highest bits will be set, which when treated as signed value (under two's complement) is equivalent to a "small" negative value.

To highlight the comment from Lưu Vĩnh Phúc:

Operand1-Operand2 is unsigned therefore when casting to s64 it's always zero extension. [..]

The sign extension is only done in the first case as only then there is a widening conversion, and it is indeed always zero extension.

Quotes from the standard, emphasis mine. Regarding small - large:

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n$ where n is the number of bits used to represent the unsigned type). [..]

§ 4.7/2

Regarding the conversion from unsigned to signed:

If the destination type [of the integral conversion] is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.

§ 4.7/3

@DanielJour: My assumption is wrong generally speaking, but on my compiler on my target it was 32 bits. My quesiton is mainly regarding the first case: "In some cases it gives -5 and sometimes 4294967291", Is there way to force the sign extension without explicit casting of any of the 2 operands to int64_t? — AYZAB, Nov 03 '16 at 08:41
@AYZAB real "sign extension" is only done when you convert from a type `X` to `Y` where **both** are signed types and `Y` is wider ("has more bits") than `X`. — Daniel Jour, Nov 03 '16 at 08:52
Thank you for your response. So just to make sure (if we are sure of the types) u32 would always be extended with zeros if casted to s64 and in the above case to work correctly we have no case other solution other than casting to s32 for a beginning (or casting one of the operands to s32) and then casting to s64. If you can provide to me a Standard excerpt for the above statement "real "sign extension" is only done when you convert from a type X to Y where both are signed types and Y is wider ("has more bits") than X." I would be really thankful to you — AYZAB, Nov 03 '16 at 09:18
@AYZAB "sign extension" is something the standard doesn't even bother speaking about. It's an implementation issue of how to store the value "[..] unchanged if it can be represented in the destination type". What's the behavior you want to achieve? u32 to (in the end) s64? But with a result of `-5` (in this case)?. The correct approach should then be `int64_t result = reinterpret_cast(small - large)`. This performs unsigned modular arithmetic, converts to signed by keeping the bit pattern (thus not implementation defined, but well defined) and finally doing the sign extension. — Daniel Jour, Nov 03 '16 at 23:16

score 2 · Answer 2 · edited May 23 '17 at 12:01

Sign extension is platform dependent, where platform is a combination of a compiler, target hardware architecture and operating system.

Moreover, as Paul R mentioned, width of built-in types (like unsigned long) is platform-dependent too. Use types from <cstdint> to get fixed-width types. Nevertheless, they are just platform-dependent definitions, so their sign extension behavior still depends on the platform.

Here is a good almost-duplicate question about type sizes. And here is a good table about type size relations.

score 1 · Answer 3 · answered Nov 03 '16 at 08:35

Type promotions, and the corresponding sign-extensions are specified by the C++ language.

What's not specified, but is platform-dependent, is the range of integer types provided. It's even Standard-compliant for char, short int, int, long int and long long int all to have the same range, provided that range satisfies the C++ Standard requirements for long long int. On such a platform, no widening or narrowing would ever happen, but signed<->unsigned conversion could still alter values.

Is Sign Extension in C++ a compiler option, or compiler dependent or target dependent?

3 Answers3