5

I compiled this code using Visual Studio 2010 (cl.exe /W4) as a C file:

int main( int argc, char *argv[] )
{
    unsigned __int64 a = 0x00000000FFFFFFFF;
    void *orig = (void *)0xFFFFFFFF;
    unsigned __int64 b = (unsigned __int64)orig;
    if( a != b )
        printf( " problem\ta: %016I64X\tb: %016I64X\n", a, b );
    return;
}

There are no warnings and the result is:

problem a: 00000000FFFFFFFF b: FFFFFFFFFFFFFFFF

I suppose int orig = (int)0xFFFFFFFF would be less controversial as I'm not assigning a pointer to an integer. However the result would be the same.

Can someone explain to me where in the C standard it is covered that orig is sign extended from 0xFFFFFFFF to 0xFFFFFFFFFFFFFFFF?

I had assumed that (unsigned __int64)orig would become 0x00000000FFFFFFFF. It appears that the conversion is first to the signed __int64 type and then it becomes unsigned?

EDIT: This question has been answered in that pointers are sign extended which is why I see this behavior in gcc and msvc. However I don't understand why when I do something like (unsigned __int64)(int)0xF0000000 it sign extends to 0xFFFFFFFFF0000000 but (unsigned __int64)0xF0000000 does not instead showing what I want which is 0x00000000F0000000.

EDIT: An answer to the above edit. The reason that (unsigned __int64)(int)0xF0000000 is sign extended is because, as noted by user R:

Conversion of a signed type (or any type) to an unsigned type always takes place via reduction modulo one plus the max value of the destination type.

And in (unsigned __int64)0xF0000000 0xF0000000 starts off as an unsigned integer type because it cannot fit in an integer type. Next that already unsigned type is converted unsigned __int64.

So the takeaway from this for me is with a function that's returning a 32-bit or 64-bit pointer as an unsigned __int64 to compare I must first convert the 32-bit pointer in my 32-bit application to an unsigned type before promoting to unsigned __int64. The resulting code looks like this (but, you know, better):

unsigned __int64 functionidontcontrol( char * );
unsigned __int64 x;
void *y = thisisa32bitaddress;
x = functionidontcontrol(str);
if( x != (uintptr_t)y )



EDIT again: Here is what I found in the C99 standard: 6.3.1.3 Signed and unsigned integers

  • 1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
  • 2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.49)
  • 3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
  • 49) The rules describe arithmetic on the mathematical value, not the value of a given type of expression.
Community
  • 1
  • 1
loop
  • 3,460
  • 5
  • 34
  • 57
  • Are your pointers 64bit? – sth Oct 06 '11 at 17:09
  • No 32-bit pointers. This is an oversimplification of a function I am dealing with and other things I really cannot change to demonstrate the problem I am having. – loop Oct 06 '11 at 17:13
  • Your 32-bit pointer is converted to a 32-bit int before being converted to your unsigned int64 – Cthutu Oct 06 '11 at 17:36
  • Also see [GCC Manual, 4.7 Arrays and Pointers](http://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html): *"... if the pointer representation is larger than the integer type, [then GCC] sign-extends the pointer ... Future versions of GCC may zero-extend, or use a target-defined ptr_extend pattern. Do not rely on sign extension"*. – jww Aug 11 '16 at 18:27

4 Answers4

7

Converting a pointer to/from an integer is implementation defined.

Here is how gcc does it, i.e. it sign extends if the integer type is larger than the pointer type(this'll happen regardless of the integer being signed or unsigned, just because that's how gcc decided to implement it).

Presumably msvc behaves similar. Edit, the closest thing I can find on MSDN is this/this, suggesting that converting 32 bit pointers to 64 bit also sign extends.

nos
  • 223,662
  • 58
  • 417
  • 506
  • This answer is correct. The links are helpful. Let me ask you, putting aside pointers for the moment, how come something like `((unsigned __int64)(int)0xFFFFFFFF)` has a value of 0xFFFFFFFFFFFFFFFF in gcc and msvc? – loop Oct 06 '11 at 17:43
  • @test: Conversion of a signed type (or any type) to an unsigned type **always** takes place via reduction modulo one plus the max value of the destination type. – R.. GitHub STOP HELPING ICE Oct 06 '11 at 17:48
  • @R: So to properly convert to for example a signed type like `int` to an unsigned type of larger precision like `unsigned __int64` I must first cast the signed type as its unsigned type? Are you saying that `(unsigned __int64)(unsigned int)0xFFFFFFFF` is the proper way to convert? – loop Oct 06 '11 at 17:57
  • `0xFFFFFFFF` already has type `unsigned int`; the cast is a no-op. The only thing "improper" in the whole thing you're doing is converting a value outside the range of `int` to `int`. – R.. GitHub STOP HELPING ICE Oct 06 '11 at 20:12
  • @R Thanks. I was trying to demonstrate with constants but the reality is a little more complicated. As it is I probably confused things further. I'll add another edit to my original post. – loop Oct 06 '11 at 22:36
1

From the C99 standard (§6.3.2.3/6):

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

So you'll need to find your compiler's documentation that talks about that.

Mat
  • 202,337
  • 40
  • 393
  • 406
0

Integer constants (e.g, 0x00000000FFFFFFFF) are signed integers by default, and hence may experience sign extension when assigned to a 64-bit variable. Try replacing the value on line 3 with:

0x00000000FFFFFFFFULL
  • You won't need to use the U as it will be properly interpreted with the two Ls – Cthutu Oct 06 '11 at 17:26
  • `0xffffffff` is not a signed integer. It's an unsigned integer. – R.. GitHub STOP HELPING ICE Oct 06 '11 at 17:53
  • R: Unless the value is specified as unsigned (with a `U` prefix), it's signed by default. Even if it can't fit into a signed integer! –  Oct 06 '11 at 19:51
  • @duskwuff: Not true. Per 6.4.4.1 Integer constants, "The type of an integer constant is the first of the corresponding list in which its value can be represented." For "no suffix" hex constants, the list is "int, unsigned int, long, unsigned long, long long, unsigned long long". – R.. GitHub STOP HELPING ICE Oct 06 '11 at 20:26
0

Use this to avoid the sign extension:

unsigned __int64 a = 0x00000000FFFFFFFFLL;

Note the L on the end. Without this it is interpreted as a 32-bit signed number (-1) and then cast.

Cthutu
  • 8,713
  • 7
  • 33
  • 49