2

I am trying to understand the output of the code given at : http://phrack.org/issues/60/10.html

Quoting it here for reference:

#include <stdio.h>

int main(void){
        int l;
        short s;
        char c;

        l = 0xdeadbeef;
        s = l;
        c = l;

        printf("l = 0x%x (%d bits)\n", l, sizeof(l) * 8);
        printf("s = 0x%x (%d bits)\n", s, sizeof(s) * 8);
        printf("c = 0x%x (%d bits)\n", c, sizeof(c) * 8);

        return 0;
}

The output i get on my machine is:-

l = 0xdeadbeef (32 bits)
s = 0xffffbeef (16 bits)
c = 0xffffffef (8 bits)

Here is my understanding:-

The assignments s=l, c=l will result in s and c being promoted to ints and they will have the last 16 bits (0xbeef) and last 8 bits (0xef) of l respectively.

Printf tries to interpret each of the above values (l,s and c) as unsigned integers (as %x is passed as the format specifier). From the output i see that sign extension has taken place. My doubt is that since %x represents unsigned int, why has the sign extension taken place while printing s and c? Should not the output for s be 0x0000beef and for c be 0x000000ef?

user720694
  • 2,035
  • 6
  • 35
  • 57
  • The sign extension took place during the push, not the print. And your assignments are implementation defined. If a value is not representable for a signed target type, the implementation takes over. Don't rely on the observed behavior your getting (though bit-truncation is indeed the most common behavior I've certainly witnessed). – WhozCraig Jan 19 '15 at 05:43
  • 3
    `printf` is a variadic function, which means that arguments of the types `short` and `char` are promoted to `int` when the function is called. – Frxstrem Jan 19 '15 at 05:44
  • 2
    `sizeof` returns `size_t`, to print `size_t` you must use [`%zu`](http://stackoverflow.com/questions/940087/whats-the-correct-way-to-use-printf-to-print-a-size-t) – phuclv Jan 19 '15 at 07:50
  • and a byte doesn't always have 8 bits, use `CHAR_BIT` instead – phuclv Jan 19 '15 at 07:52
  • "The assignments `s=l` and `c=l` will result in `s` and `c` being promoted to `int`" - well, more like "will result in `l` being truncated to `short` and `char`". – barak manos Jan 19 '15 at 09:22
  • Side note: use `CHAR_BIT` (defined in `limits.h`) instead of `8`. – barak manos Jan 19 '15 at 09:23
  • char is intended to be an enumeration of characters and the compiler makes assumptions based on this. `signed char` or `unsigned char` exist for one byte ints. (char is used as the unit to specify sizes of other primitives, commonly 8 bits bytes.) Since c++11 there are more explicit types available may be preferred to the ambiguous naming and assumptions of the three char types. eg `uint32_t` `int_least8_t` `uint_fast16_t` ; respectively unsigned exactly 32bit if available on the system, smallest available signed int of at least 8bit, and fastest available unsigned int of at least 16bit. – Max Power Jul 02 '22 at 03:10

2 Answers2

2

why has the sign extension taken place while printing s and c

Let's see the following code:

unsigned char ucr8bit; /* Range is 0 to 255 on my machine */
signed char cr8bit; /* Range is -128 to 127 on my machine */
int i32bit;
cr8bit = MINUS_100;  /* (char)(-100) or 0x9C */
i32bit = cr8bit;     /* i32 bit is -100 or 0xFFFFFF9C */

As you can see, althout the number -100 is same, its representation is not mere prepending 0s in wider character but may be prepending the MSB or sign bit of the signed type in 2s complement system and 1s complement system.

In your example you are trying to print s and c as wider type and hence getting the sign bit replication.


Also your code contains many sources of undefined and unspecified behavior and thus may give different output on different compilers. (For instance, you should use signed char instead of char as char may behave as unsigned char on some implementation and as signed char on some other implmentations)

l = 0xdeadbeef; /* Initializing l from an unsigned
                   if sizeof l is 32 bit UB as l is signed */
s = l;  /* Initializing with an undefined value. Moreover
           implicit conversion wider to narrower type */
printf("l = 0x%x (%d bits)\n", l, sizeof(l) * 8);  /* Using %x
               to print signed number and %d to print size_t */
Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
  • If `short` and `char` are always promoted to `int`, then `printf("%x", someShortValue);` must be equivalent in every way to `printf("%x", (int)someShortValue);`, which is defined, right? – user253751 Jan 19 '15 at 06:40
  • 2
    @immibis Those both cause undefined behaviour; `%x` can only be used with `unsigned int`. – M.M Jan 19 '15 at 07:12
  • 3
    Using `%x` with non-negative values of `int` ; or smaller values which have been promoted to non-negative values of `int`, is *technically* undefined behaviour but most people seem to regard the standard as defective in that area, and in practical terms, we treat it as if it did permit that. – M.M Jan 19 '15 at 07:13
1

You are using a 32-bit signed integer. That means that only 31 bits can be used for positive numbers. 0xdeadbeef uses 32 bits. Therefore, assigning it to a 32-bit signed integer makes it a negative number.

When shown with an unsigned conversion specifier, %x, it looks like the negative number that it is (with the sign extension).

When copying it into a short or char, the property of it being a negative number is retained.

To further show this, try setting:

l = 0xef;

The output is now:

l = 0xef (32 bits)
s = 0xef (16 bits)
c = 0xffffffef (8 bits)

0xef uses 8 bits which is positive when placed into a 32-bit or 16-bit variable. When you place an 8-bit number into a signed 8-bit variable (char), you are creating a negative number.

To see the retention of the negative number, try the reverse:

c = 0xef;
s = c;
l = c;

The output is:

l = 0xffffffef (32 bits)
s = 0xffffffef (16 bits)
c = 0xffffffef (8 bits)
Jumbo
  • 11
  • 1