-2

I tried coming up with a small/simple program to understand how different types of assignments work in C/C++ and what really is happening electronically. Consider this program below :

#include <iostream>

int main() {
    int i = 32769;
    short s = i;

    std :: cout << s << std :: endl;

    return 0;
}

When the above program is executed, the result of execution is : -32767. I am trying to reason about why the program outputs the above number and I am not sure if my reasoning is correct, so looking for some confirmation here and also some elaborate understanding of why does this happen :

  • Integer in C/C++ is allocated 4 bytes of storage in memory and we are assigning the integer variable a value of 32769 which essentially in binary is = 2^15 + 2^0

  • Shorts in C/C++ are allocated 2 bytes of storage in memory and we assign the value of the integer variable to short. The interesting thing here is that the 15th bit in the integer representation is set, however when the assignment happens the 2 most significant bytes of integers are punted and the run time system thinks that it's trying to store a -1 in the short variable

  • As the run time system thinks that its trying to store a -1 (because the most significant bit of the short variable is set), it tries to store the 2's complement notation of the number -1 instead which is = 1111 1111 1111 1111 (The 16 bit being the sign bit), this number evaluates to - (2^15 - 1)

I am just trying to see if my understanding of what's going on electronically is correct or not and also gain a better understanding if I am incorrect!

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190
  • 1
    I am not sure why you think this is a duplicate question. My question is demanding an understanding of things and also trying to confirm my own understanding of things. – akc1519 Sep 11 '22 at 11:02
  • You probably want to understand how to distinguish signed and unsigned types in c++. – πάντα ῥεῖ Sep 11 '22 at 11:04
  • 1
    I understand that already! I am just trying to understand the underlying details! I am not a noob, I am just wanting to know more! – akc1519 Sep 11 '22 at 11:05
  • 3
    There is no "run time system thinks", compiler emits instructions, CPU executes them. Prior to C++20, two's compelement is not required and signed integer overflow is undefined behaviour. C++ does not specify exact bit-width for integers, only minimal requirements. `int` itself can be 16-bit wide. No, `32769=1000 0000 0000 0001` is interpreted as 16-bit two's complement which is `-32767`, no one thinks anything about -1. – Quimby Sep 11 '22 at 11:06
  • @Quimby : Thanks a lot for that explanation! but how does 1000 0000 0000 0001 get evaluated to -32767? I mean I want to understand who interprets what? Also if the integer was assigned the value = -32769, then it results in a value of +32767, why and how does that happen? – akc1519 Sep 11 '22 at 11:09
  • 1
    Why do you say C/C++ - those are C++ headers an this won't compile as a C program. – user438383 Sep 11 '22 at 11:10
  • 1
    For questions like these I like to refer to the godbolt.org for a nice interactive visualization of what the result of source to machine code translation is, and how it happens. – datenwolf Sep 11 '22 at 11:11
  • 1
    @akc1519 That is the interpretation of 16-bit two's complement. You have to realize that computers always store the values in binary. `short` is an interpretation of those 16 bits as signed 16-bit integer from range -32768,32767 that C++/CPU arithmetics and `std::cout` respect. Two's complement says that if MSB bit is 0, it is just normal binary representation, if it's 1 then you do -32768 + (rest of bits as ordinary binary). In this case -32768 + 1 = -32767 – Quimby Sep 11 '22 at 11:13
  • 1
    I understand it now! thanks a lot @Quimby : I think I am thinking in the right direction! – akc1519 Sep 11 '22 at 11:18
  • 1
    @akc1519 Because -32769 is stored as 32-bit number `1111 1111 0111 1111 1111 1111`. Assigning it to `short` (is UB) just takes 2 lowest bits = `0111 1111 1111 1111` , there isn't any conversion because just cutting the bits is fastest. The number is then interpreted as 16-bit two's complement without knowing where it came from and the interpretation is just +32767. Of course that vastly different from -32769, hence the UB and why should you be careful about casts. – Quimby Sep 11 '22 at 11:19
  • 1
    The standard does not require that either an `int` or a `short` can represent the value `32769`. So your code is not even required to compile. Formally, it is implementation-defined whether an `int` or a `short` can represent a value more than `32767` (it is not required, but an implementation is required to document what it does). The layout of `int` and `short` (except in relatively recent standards/drafts) is unspecified too. So any behaviour you get is going to depend on what implementation (compiler/library) you are using. – Peter Sep 11 '22 at 11:20
  • 1
    Here is the dupe: [How does casting to "signed int" and back to "signed short" work for values larger than 32,767?](https://stackoverflow.com/questions/11962596/how-does-casting-to-signed-int-and-back-to-signed-short-work-for-values-larg) – Jason Sep 11 '22 at 11:39

1 Answers1

2

On my system int is 4 bytes long and short is 2 bytes long.

Hexadecimal representation of the integer number 32769 is: 0x00008001. This represents POSITIVE value. Please note I purposefully specified ALL bytes here.

Hexadecimal representation of the short value in s is 0x8001. As shorts are signed this is NEGATIVE value of -32767.

What is happening here is truncation when converting integer i to short s. The conversion drops most significant bytes and this causes change in BOTH value and sign and is typical example of integer overflow. If you drop the information (and dropping two bytes out of four is just that) you MUST be prepared for possible consequences.

Using unsigned short may be a solution here, as two bytes, unsigned variables, cover values in 0-65535 range.

Tomek
  • 4,554
  • 1
  • 19
  • 19