0

As the title says, I get a "weird" result when running the following code:

#include <stdio.h>

int main()
{
    char buff[4] = {0x17, 0x89, 0x39, 0x40};
    unsigned int* ptr = (unsigned int*)buff;
    char a = (char)((*ptr << (0*8)) >> (3*8));
    char b = (char)((*ptr << (1*8)) >> (3*8));
    char c = (char)((*ptr << (2*8)) >> (3*8));
    char d = (char)((*ptr << (3*8)) >> (3*8));

    printf("0x%x\n", *ptr);
    printf("0x%x\n", a);
    printf("0x%x\n", b);
    printf("0x%x\n", c);
    printf("0x%x\n", d);

    return 0;
}

Output:

0x40398917
0x40
0x39
0xffffff89
0x17

Why am I not getting 0x89 ?

Jonas
  • 1,019
  • 4
  • 20
  • 33

3 Answers3

4

It's because your char variables are signed and they're undergoing sign extension when being promoted (upgraded to a wider type in this case). Sign extension is a way of preserving the sign when doing this promotion, so that -119 stays as -119 whether it's 8-bit, 16-bit or a wider type.

You can fix it by explicitly using unsigned char since, in C at least, whether char is signed or unsigned is implementation-specific. From C11 6.2.5 Types /15:

The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

Sign extension does not come into play for unsigned types because they're, ... well, unsigned :-)

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
0

char, by default, is signed - this means that numbers run from -128 to 127. Any number outside of that doesn't fit. If you changed char to unsigned char, you will get the numbers you expect.

AMADANON Inc.
  • 5,753
  • 21
  • 31
  • isn't char 0x89 equals to -119 which is between -128 and 127 ? – Jonas Jul 03 '13 at 02:39
  • 1
    `char` is not `signed` by default: http://stackoverflow.com/questions/2054939/char-is-signed-or-unsigned-by-default – Shafik Yaghmour Jul 03 '13 at 02:41
  • Yes. 89 hex as a signed integer is -119. -119, when printed in hex, becomes ffffff89. The difference between 89 hex and ffffff89 is the difference between a char and an integer. – AMADANON Inc. Jul 03 '13 at 02:42
0

Use memcpy not a cast

char buff[4] = {0x17, 0x89, 0x39, 0x40};
unsigned int* ptr = (unsigned int*)buff;

This is not correct: buff does not point to an int object or array, so the cast (unsigned int*)buff is not defined.

The safe way to reinterpret buff as an unsigned int is with memcpy:

char buff[4] = {0x17, 0x89, 0x39, 0x40};
unsigned int ui;
assert (sizeof ui == sizeof buff);
memcpy (buff, &ui, sizeof ui);

When using memcpy, you have no make sure the bit representation you copy is valid for the destination type, of course.

One portable but degenerate way to do that is to check that the representation matches an existing object (beware, the following is silly code):

char *null_ptr = 0;
char null_bytes[sizeof null_ptr] = {0};
if (memcmp (null_ptr, null_bytes, sizeof null_bytes)==0) {
    char *ptr2;
    memcpy (null_bytes, ptr2, sizeof null_bytes);
    assert (ptr2 == 0);
}

This code uses memcpy and has fully defined behavior (even if useless). OTOH, the behavior of

int *ptr3 = (int*)null_bytes;

is not defined, because null_bytes is not the address of an int or unsigned int.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
  • What do you mean "it breaks typing rules" ? – Jonas Jul 03 '13 at 02:52
  • @j_kubik This use of `union` for type punning, in most cases, break the **type aliasing rules**. Here it may be OK (but some people would say it is not). – curiousguy Jul 03 '13 at 02:54
  • @Jonas In general, `union` has type aliasing rules issues, but `memcpy` is safe. – curiousguy Jul 03 '13 at 02:55
  • memcpy is not an improvement over type-casting the pointer: undefined behavior at pointer type-casting expresses our lack of knowledge about types memory layout. If you change the way memory is interpreted, or copy it bit by bit to another variable makes no difference. – j_kubik Jul 03 '13 at 02:57
  • @j_kubik It **is** an improvement. "_undefined behavior at pointer type-casting expresses our lack of knowledge about types memory layout_" No. The type-cast itself does not reinterpret bit representation, only the use of the resulting pointer does. Here the cast itself is incorrect. Of course, `memcpy` does not help you if you do not know what a correct bit representation for the target type is. – curiousguy Jul 03 '13 at 03:00
  • "The type-cast itself does not reinterpret bit representation, only the use of the resulting pointer does." Well, that's true, but only because such type-cast produced invalid pointer - using it will cause an error, but mistake was made earlier when getting the pointer in the first place. BTW such an union doesn't break aliasing rules. – j_kubik Jul 03 '13 at 03:15
  • "to check that the representation matches an existing object" - this is not a silly code, it's a useless code. It doesn't do anything. Copying memory into the variable and back is not a proof that binary representation is valid. These is no way to check for that, you only mix types for which you know that representation to be matching. – j_kubik Jul 03 '13 at 03:20
  • @j_kubik "_it's a useless code._" Of course. It is a just a proof that the behavior can be defined with `memcpy`. "_Copying memory into the variable and back_" is not what the code does. "_These is no way to check for that_" except the way my example does. – curiousguy Jul 03 '13 at 03:28
  • @j_kubik "_but only because such type-cast produced invalid pointer_" not sure what you can an "invalid pointer". Is `(float*)malloc(sizeof (float))` an invalid pointer? – curiousguy Jul 03 '13 at 03:30
  • Invalid in the sense - using it will cause undefined behavior. Could you elaborate on what your "useless" code actually does. I always thought that for `memcpy` (as according to http://www.cplusplus.com/reference/cstring/memcpy/) "The underlying type of the objects pointed by both the source and destination pointers are irrelevant for this function; The result is a binary copy of the data." So memcpy has nothing to do with binary representation of a type. Your assert will be always `true` - is that your point? – j_kubik Jul 03 '13 at 03:34
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/32785/discussion-between-curiousguy-and-j-kubik) – curiousguy Jul 03 '13 at 03:49