3
size_t size_int = sizeof(unsigned long int);
size_t size_ptr = sizeof(void*);
printf("sizeof(unsigned long int): %zu\n", size_int);
printf("sizeof(void*): %zu\n", size_ptr);

if(size_int == size_ptr) {
    int a = 0;
    void * ptr_a = &a;
    
    // case 1
    unsigned long int case_1 = *((unsigned long int*)&ptr_a);
    printf("case 1: %lu\n", case_1);

    // case 2
    unsigned long int case_2 = (unsigned long int)ptr_a;
    printf("case 2: %lu\n", case_2);

    // case 3
    unsigned long int case_3 = 0;
    memcpy(&case_3, &ptr_a, sizeof(void*));
    printf("case 3: %lu\n", case_3);
    
    // case 4
    void *ptr_b = NULL;
    memcpy(&ptr_b, &case_3, sizeof(void*));
    int *ptr_c = (int*)ptr_b;
    *ptr_c = 5;
    printf("case 5: %i\n", a);
}

In fact I am aware that there are uintptr_t and intptr_t in C99. However, for educational purposes I want to ask some questions. Before I start, I know that it is a bad practice and should never be done in this way.

Q1. Could case 1 cause an undefined behaviour? Is it safe? If it is not, why? If it is safe, is it guaranteed that "case_1" variable holds exactly the same address as an unsigned long int?
Q2. Same above for case 2.
Q3. Same above for case 3.
Q4. Same above for case 4.

Lundin
  • 195,001
  • 40
  • 254
  • 396
ahk
  • 43
  • 5

3 Answers3

3

unsigned long int case_1 = *((unsigned long int*)&ptr_a);

Ignoring the pointer vs integer size concern, this is still undefined behavior because of strict aliasing violation. What is the strict aliasing rule? The memory location where a void* object resides cannot get de-referenced as an unsigned long. This can in turn cause incorrectly generated machine code during optimization etc, particularly when the code is divided across multiple translation units. So this is valid concern and not just theoretical "language-lawyering".

There might also, at least theoretically, be undefined behavior due to issues with alignment. In practice, I don't really see how alignment will be an issue in case the pointers and integers hold the same size.

Even more theoretically, there might be trap representations, either in the unsigned long (which would in turn requires an exotic 1's complement or signed magnitude system), or in the pointer type itself. Some hardware might have trap representations for certain addresses and in theory you could get a hardware exception on such systems, though probably just when going from integer to pointer.

unsigned long int case_2 = (unsigned long int)ptr_a;

This is well-defined - we can always converts from pointers to integers and back. But again there's the issue with object sizes and potentially also with alignment - especially when going from integers to pointers.

memcpy(&case_3, &ptr_a, sizeof(void*));

Apart from the same size and alignment concerns, this is valid code. And C doesn't place any requirements on the binary representation of pointers, that's beyond the scope of the standard.

memcpy(&ptr_b, &case_3, sizeof(void*));

Same concerns as in 3).

Lundin
  • 195,001
  • 40
  • 254
  • 396
2

While undefined behavior may happen when casting pointers between different types you are allowed to cast void * pointer to/from any other pointer type.

Is case # undefined behavior? Is it safe? If it is not, why?

unsigned long int case_1 = *((unsigned long int*)&ptr_a);

It's undefined behavior. You are accessing a void * value with a unsigned long int type. Because unsigned long int is not compatible with void*, you are breaking strict aliasing. See C11 6.5p7.

unsigned long int case_2 = (unsigned long int)ptr_a;

It could be undefined behavior. See C11 6.3.2.3p6. While it says that Any pointer type may be converted to an integer type it also states If the result cannot be represented in the integer type, the behavior is undefined. So, on an architecture where unsigned long has 32 bits, but void * has 64 bits, this could be undefined behavior. The result is implementation-defined in any case.

 unsigned long int case_3 = 0;
 memcpy(&case_3, &ptr_a, sizeof(void*));

This is obviously undefined behavior when sizeof(void*) > sizeof(unsigned long).

 printf("case 3: %lu\n", case_3);

And this could be undefined behavior when case_3 is a trap representation. Ie. in case the content of case_3 is not presenting a valid "unsigned long int" object, reading from that object performs a trap. But on nowadays architectures any bit pattern is valid for an unsigned long, so it will result in some implementation-defined pattern.

 memcpy(&ptr_b, &case_3, sizeof(void*));

This is equal to ptr_b = case_3 and is valid.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
1

C language reference (or more exactly draft n1570 for C11) says at 6.3.2.3 Conversions / Pointers ยง6:

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

So even if sizeof(void*)==sizeof(unsigned long int) it could be undefined behaviour if for any reason the result could not be represented. Causes could be:

  • padding bits in the unsigned long type resulting in less represented values
  • patholigical conversion resulting in a non representable value

For common architectures (in fact all I know), the conversion of a pointer to an unsigned long gives a memory address with the exact same bits, and there are no padding in any integer type, so no undefined behaviour should occur. But the standard is highly conservative...

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252