43

Can someone clarify what happens when an integer is cast to a short in C? I'm using Raspberry Pi, so I'm aware that an int is 32 bits, and therefore a short must be 16 bits.

Let's say I use the following C code for example:

int x = 0x1248642;
short sx = (short)x;
int y = sx;

I get that x would be truncated, but can someone explain how exactly? Are shifts used? How exactly is a number truncated from 32 bits to 16 bits?

hippietrail
  • 15,848
  • 18
  • 99
  • 158
buydadip
  • 8,890
  • 22
  • 79
  • 154
  • 2
    Note that the cast (like most casts) is unnecessary. You can declare `short sx = x;`, and the value of `x` will be implicitly converted to `short`. – Keith Thompson Jan 19 '16 at 20:12
  • 3
    The actual size of "int" and "short" can and will vary from platform to platform. But yes, let's say "int" is 32 bit and "short" is 16 bit: 1) Yes, the cast will truncate the value from 32 to 16 bits, 2) Yes, the upper 16 bits are "lost", 3) No, there's no "shift". PS: Did you know that your Raspberry Pi probably has a full-fledged copy of [Mathematica](https://www.raspberrypi.org/learning/getting-started-with-mathematica/)? Definitely worth checking out :) – paulsm4 Jan 19 '16 at 20:13
  • Not exactly a duplicate, but closely related: http://stackoverflow.com/q/19273658/4996248 – John Coleman Jan 19 '16 at 20:13
  • 2
    An aside: you can remove bit-width guesswork with `#include ` to bring in `int32_t`, `int16_t`, etc. – rubicks Jan 20 '16 at 01:13

6 Answers6

45

According to the ISO C standard, when you convert an integer to a signed type, and the value is outside the range of the target type, the result is implementation-defined. (Or an implementation-defined signal can be raised, but I don't know of any compilers that do this.)

In practice, the most common behavior is that the high-order bits are discarded. So assuming int is 32 bits and short is 16 bits, converting the value 0x1248642 will probably yield a bit pattern that looks like 0x8642. And assuming a two's-complement representation for signed types (which is used on almost all systems), the high-order bit is the sign bit, so the numeric value of the result will be -31166.

int y   =   sx;

This also involves an implicit conversion, from short to int. Since the range of int is guaranteed to cover at least the entire range of short, the value is unchanged. (Since, in your example, the value of sx happens to be negative, this change of representation is likely to involve sign extension, propagating the 1 sign bit to all 16 high-order bits of the result.)

As I indicated, none of these details are required by the language standard. If you really want to truncate values to a narrower type, it's probably best to use unsigned types (which have language-specified wraparound behavior) and perhaps explicit masking operations, like this:

unsigned int x = 0x1248642;
unsigned short sx = x & 0xFFFF;

If you have a 32-bit quantity that you want to shove into a 16-bit variable, the first thing you should do is decide how you want your code to behave if the value doesn't fit. Once you've decided that, you can figure out how to write C code that does what you want. Sometimes truncation happens to be what you want, in which case your task is going to be easy, especially if you're using unsigned types. Sometimes an out-of-range value is an error, in which case you need to check for it and decide how to handle the error. Sometimes you might want the value to saturate, rather than truncate, so you'll need to write code to do that.

Knowing how conversions work in C is important, but if you start with that question you just might be approaching your problem from the wrong direction.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • 1
    If your code assumes `x` will fit in a short, instead of masking you can `assert( x <= USHRT_MAX )` to enforce that assumption. – Schwern Jan 19 '16 at 20:13
  • 1
    Notice `x & 0xFFF` != `(short) x` if `CHAR_BIT != 8`. – edmz Jan 19 '16 at 20:16
  • @black: Or, more precisely, if `CHAR_BIT * sizeof (short) != 16`. (I've worked on systems with `CHAR_BIT==8` where `sizeof (short)` is 4 or even 8. – Keith Thompson Jan 19 '16 at 20:18
  • Unless you are programming for a very esoteric platform (and then you probably know), you can safely assume that the truncating behaviour takes place. – fuz Jan 19 '16 at 20:20
  • Very useful answer, in particular the last two paragraphs. – CompuChip Jan 20 '16 at 12:59
16

The 32 bit value is truncated to 16 bits in the same way a 32cm long banana bread would be cut if you jam it into a 16cm long pan. Half of it would fit in and still be a banana bread, and the rest will be "gone".

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Amit
  • 45,440
  • 9
  • 78
  • 110
  • 5
    Not the best analogy. I can fit a 32-cm banana into a 16-cm pan by smashing it or by cutting it into two side-by-side pieces. Bits in a word have much tighter constraints than bananas bits in a pan. And you say nothing about which half you end up with, or why. – Keith Thompson Jan 19 '16 at 20:25
  • @KeithThompson - you could slice and dice 32 bits as well (you need a knife for the bread, or buy manipulation operations for the bits though) but the analogy calls for shoving the cake into the pan, not slicing it. Regarding the part that goes in or three part that goes away, yes, I didn't manage to include that detail – Amit Jan 19 '16 at 20:32
  • 1
    Gonna have to say this is the best answer on SO. – Hunter Kohler Jan 04 '22 at 02:50
8

Truncation happens in CPU registers. These have different sizes: 8/16/32/64 bits. Now, you can imagine a register like:

<--rax----------------------------------------------------------------> (64-bit)
                                    <--eax----------------------------> (32-bit)
                                                      <--ax-----------> (16-bit)
                                                      <--ah--> <--al--> (8-bit high & low)
01100011 01100001 01110010 01110010 01111001 00100000 01101111 01101110

x is first given the 32 bit value 0x1248642. In memory*, it'll look like:

-----------------------------
|  01  |  24  |  86  |  42  |
-----------------------------
 31..24 23..16 15..8  7..0       

Now, the compiler loads x in a register. From it, it can simply load the least significant 16 bits (namely, ax) and store them into sx.


*Endianness is not taken into account for the sake of simplicity

Dan Bechard
  • 5,104
  • 3
  • 34
  • 51
edmz
  • 8,220
  • 2
  • 26
  • 45
  • I believe the OP wants to know how that discarding happens. What 16 bits of the original 32 are retained? – Schwern Jan 19 '16 at 20:16
  • @Schwern Thanks, added more explanation -- does that clarify? – edmz Jan 19 '16 at 20:25
  • 1
    Yeah. Will it always be the least significant 16 bits? – Schwern Jan 19 '16 at 20:31
  • @black I submitted an edit to fix a typo in the result, but there weren't enough characters so I improved (IMO) the register illustration as well. Feel free to improve it further if you disagree with my interpretation. – Dan Bechard Jan 20 '16 at 14:42
  • @Dan Thanks Dan, I took some of the suggestions you proposed. It should look much better now. – edmz Jan 20 '16 at 16:35
5

Simply the high 16 bits are cut off from the integer. Therefore your short will become 0x8642 which is actually negative number -31166.

Zbynek Vyskovsky - kvr000
  • 18,186
  • 3
  • 35
  • 43
  • Although I believe it's implementation specific whether high or low bits are used? – Rivasa Jan 19 '16 at 20:11
  • @Link The result is entirely implementation-defined according to the language standard, but I don't believe there are any compilers that will give you the high-order bits. – Keith Thompson Jan 19 '16 at 20:12
  • @Link: No, it's not implementation specific. Any cast to narrower type will always cut off the most significant bits. If would be different for unions of types of different widths - here the big endian/little endian would make the difference. But for the above it's everywhere the same. – Zbynek Vyskovsky - kvr000 Jan 19 '16 at 20:12
  • 4
    @ZbynekVyskovsky-kvr000: No, the result of converting an out-of-range value to a signed type is implementation-defined. See [N1570](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) section 6.3.1.3. (Discarding the high-order bits is certainly the most common behavior.) – Keith Thompson Jan 19 '16 at 20:20
  • @KeithThompson: Weird, I always assumed the compilers are consistent here for unsigned and signed types. Fortunately all of them do the same as for unsigned values, otherwise lot of software would stop working... – Zbynek Vyskovsky - kvr000 Jan 19 '16 at 20:26
5

Perhaps let the code speak for itself:

#include <stdio.h>

#define BYTETOBINARYPATTERN "%d%d%d%d%d%d%d%d"
#define BYTETOBINARY(byte)  \
   ((byte) & 0x80 ? 1 : 0), \
   ((byte) & 0x40 ? 1 : 0), \
   ((byte) & 0x20 ? 1 : 0), \
   ((byte) & 0x10 ? 1 : 0), \
   ((byte) & 0x08 ? 1 : 0), \
   ((byte) & 0x04 ? 1 : 0), \
   ((byte) & 0x02 ? 1 : 0), \
   ((byte) & 0x01 ? 1 : 0) 

int main()
{
    int x    =   0x1248642;
    short sx = (short) x;
    int y    =   sx;

    printf("%d\n", x);
    printf("%hu\n", sx);
    printf("%d\n", y);

    printf("x: "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN"\n",
        BYTETOBINARY(x>>24), BYTETOBINARY(x>>16), BYTETOBINARY(x>>8), BYTETOBINARY(x));

    printf("sx: "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN"\n",
        BYTETOBINARY(y>>8), BYTETOBINARY(y));

    printf("y: "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN"\n",
        BYTETOBINARY(y>>24), BYTETOBINARY(y>>16), BYTETOBINARY(y>>8), BYTETOBINARY(y));

    return 0;
}

Output:

19170882
34370
-31166

x: 00000001 00100100 10000110 01000010
sx: 10000110 01000010
y: 11111111 11111111 10000110 01000010

As you can see, int -> short yields the lower 16 bits, as expected.

Casting short to int yields the short with the 16 high bits set. However, I suspect this is implementation specific and undefined behavior. You're essentially interpreting 16 bits of memory as an integer, which reads 16 extra bits of whatever rubbish happens to be there (or 1's if the compiler is nice and wants to help you find bugs quicker).

I think it should be safe to do the following:

int y = 0x0000FFFF & sx;

Obviously you won't get back the lost bits, but this will guarantee that the high bits are properly zeroed.

If anyone can verify the short -> int high bit behavior with an authoritative reference, that would be appreciated.

Note: Binary macro adapted from this answer.

Community
  • 1
  • 1
Dan Bechard
  • 5,104
  • 3
  • 34
  • 51
  • I'd love to know why the high bits are set too, even though thats a separate question – buydadip Jan 19 '16 at 20:20
  • 6
    This shows only what the behavior is for the implementation you used to generate the output. – Keith Thompson Jan 19 '16 at 20:21
  • @KeithThompson Thanks for the insight Keith. I did some further testing and updated my answer. It looks like your answer is more knowledgeable and complete (upvoted), but I'll leave mine in case anyone fancies running the code themselves out of curiosity. – Dan Bechard Jan 19 '16 at 20:41
3

sx value will be the same as 2 least significant bytes of x, in this case it will be 0x8642 which (if interpreted as 16 bit signed integer) gives -31166 in decimal.

nsilent22
  • 2,763
  • 10
  • 14
  • `0x8642` is not `-31166` in decimal. `0x8642` is `34370` in decimal. That value, when *converted* to a 16-bit signed type, typically yields `-31166`, but that's a different value. – Keith Thompson Jan 19 '16 at 20:28
  • @KeithThompson: Thank you, I clarified my answer. – nsilent22 Jan 19 '16 at 20:31