16

I know, that it is well defined by the C standard that (unsigned)-1 must yield 2^n-1, i. e. an unsigned integer with all its bits set. The same goes for (uint64_t)-1ll. However, I cannot find something in the C11 standard that specifies how (uint64_t)-1 is interpreted.

So, the question is: Is there any guarantee in the C standard, which of the following holds true?

(uint64_t)-1 == (uint64_t)(unsigned)-1   //0x00000000ffffffff
(uint64_t)-1 == (uint64_t)(int64_t)-1    //0xffffffffffffffff
cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106

4 Answers4

15

Yes. See C11 6.3.1.3 Signed and unsigned integers:

1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.60)

3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

60) The rules describe arithmetic on the mathematical value, not the value of a given type of expression.

Case 2 applies, so -1 is reduced modulo 0x10000000000000000 to yield 0xffffffffffffffff.

Community
  • 1
  • 1
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • I think your comment applies to the second expression in the question; that must be true. It does not apply to the RHS of the first expression; the `(unsigned)-1` is 0xFFFF_FFFF_FFFF_FFFF if `sizeof(unsigned) == 8` and is 0xFFFF_FFFF if `sizeof(unsigned) == 4`, and the value is representable in `uint64_t`. So, the standard does not guarantee that the first expression evaluates to true. – Jonathan Leffler Aug 18 '13 at 21:45
  • 2
    I was answering the question in the subject line. I did not touch on the `(uint64_t)(unsigned)-1` issue, which is obviously different unless `unsigned` happens to be 64-bit. – R.. GitHub STOP HELPING ICE Aug 18 '13 at 23:50
6

The expressions 1 and -1 have type int. When converted to uint64_t, the principle that 2n is added or subtracted until the value is in range applies, so the result is always 2n-1, in this case with n=64. Therefore, (uint64_t)-1 is always 264-1..

The expression (int64_t)-1 evaluates to -1, so the same reasoning applies to the expression (uint64_t)(int64_t)-1, which too always evaluates to 264-1.

On the other hand, (unsigned)-1 is a positive value of type unsigned int that may be 216-1, 232-1, 264-1 or various other values depending on the compilation platform. These values may not yield 264-1 when converted to uint64_t.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • isn't a better choice to just pick `UINT64_MAX` or I'm missing something from the OP question ? why casting a signed number to an unsigned one anyway ? – user2485710 Aug 18 '13 at 21:15
  • 1
    @user2485710 Shouldn't you ask the OP? Yes, it seems less error-prone to use `UINT64_MAX`, but that's not the question. – Pascal Cuoq Aug 18 '13 at 21:18
  • 1
    The question applies to any signed integer that is simultaneously casted to a) a larger type and b) an unsigned type. As in `(unsigned)(signed char)-123`. – cmaster - reinstate monica Aug 18 '13 at 21:23
  • 2
    @cmaster: your example is two sequential casts -- there's no such thing as two 'simultaneous' casts. There's just a single cast to a type that is both larger and (un)signed, which is well defined, or there's a series of casts in a specific order, which is equally well defined (but may be different) – Chris Dodd Aug 18 '13 at 21:27
  • @ChrisDodd +1 A tricky version of the question would be `(uint64_t)-1U`, though. Same result as `(uint64_t)(unsigned)-1`, for basically the same reason, but well hidden. I should make a note of this for the next IOCCC. – Pascal Cuoq Aug 18 '13 at 21:29
  • @ChrisDodd Its one cast to get a `signed char` and another cast to simultaneously produce 32 bits and convert to unsigned. Yes, it's a confusing example... – cmaster - reinstate monica Aug 18 '13 at 21:31
  • 1
    @PascalCuoq: `(uint64_t)-1U` is the same as `(uint64_t)(-((unsigned)1))`, since constant literals never include signs in C (or C++), except in the exponent part of a floating point constant. So the `-` will be a unary prefix operation on the constant. – Chris Dodd Aug 18 '13 at 21:33
  • @ChrisDodd I know, but this doesn't change anything here: `-1U` is an expression of type `unsigned int` that evaluates to `UINT_MAX`, and the conversion of that expression to `uint64_t` may not be 2^64-1 depending on the compilation platform. – Pascal Cuoq Aug 18 '13 at 21:37
  • @cmaster What part of **Same result as (uint64_t)(unsigned)-1, for basically the same reason** is confusing? – Pascal Cuoq Aug 18 '13 at 21:43
  • 1
    @cmaster As an aside, you ask about the C standard but you tagged your question “C++” and you test with `g++`? C and C++ are different languages. – Pascal Cuoq Aug 18 '13 at 22:02
  • @PascalCuoq Aiii, I messed up what you and ChrisDodd said, of course g++ supports your version. I'll delete the wrong comment. As to you aside: It would also be interesting to know if there are any differences between the languages in this respect, but I guess there are none, so I don't care which compiler I use. – cmaster - reinstate monica Aug 19 '13 at 06:37
1

I'm guessing you're writing (uint64_t)-1 instead of -1ULL because you don't want to make assumptions about the size of unsigned long long? If so, that's good. However, there is an alternative which hasn't been mentioned yet (and doesn't actually answer your question) but can save a lot of worry by side-stepping the question:

An alternative

A good habit to be in is to always use UINT64_C(x) instead of (uint64_t)x. This is a macro defined in <stdint.h> which automatically appends U, UL, or ULL as needed. Thus, UINT64_C(-1) resolves to either -1U, -1UL, or -1ULL, depending on your target. This is guaranteed to always work correctly.

Perils of type-casting

Note that (uint64_t)x actually does not even work correctly in general. For example,

(uint64_t)2147483648                // RISKY

generates a warning on some compilers because the value 2147483648 (2^31) is too big to fit into a 32-bit integer, and the following does not even remotely work:

(uint64_t)1000000000000000000       // RISKY

However, if you use UINT64_C() instead, then everything is golden:

UINT64_C(2147483648)                // GOOD

UINT64_C(1000000000000000000)       // GOOD

UINT64_C(-1)                        // GOOD

Notes:

  • The _C suffix stands for “constant.”
  • In <stdint.h> there are also 8-, 16-, and 32-bit versions for both signed and unsigned values.
  • For the special case of –1, you could also just write UINT64_MAX.
Todd Lehman
  • 2,880
  • 1
  • 26
  • 32
  • Well, your guess was wrong; this question was really about how the C/C++ standards define a cast to work that changes both sign and size of an integer in a single operation. Nevertheless, the `UINT64_C()` macro is an interesting sidenote, even though it really doesn't answer the question. Btw: `long long int` is guaranteed to be at least 64 bits, so `(uint64_t)2147483648ull` is guaranteed to yield the correct result. – cmaster - reinstate monica Aug 08 '15 at 09:34
  • 1
    Though it's good to keep in mind that -1u is not -1 converted to unsigned, but operator- applied to 1u and hence other rules apply to make it work: http://brnz.org/hbr/?p=1433 – Trass3r Mar 14 '18 at 11:59
  • IIRC, C++ integer literals will have a type wide enough to represent the value. So if `int` is a 32-bit type, `2147483648` will have type `long` or `long long`. It's *not* actually risky. (Unless I'm misremembering, or there's a C/C++ difference). However, `1024*1024*1024*1024` is risky because it will overflow a 32-bit `int` in the last multiply! – Peter Cordes Mar 30 '20 at 10:22
  • 1
    `UINT64_C(-1)` is undefined (no diagnostic required); as per C11 7.20.4/2 the argument must be an unsuffixed integer constant with a value in range for the type. `-1` is not a constant, it's a unary minus expression. `-UINT64_C(1)` would be a correct alternative – M.M Jun 04 '20 at 20:38
  • The two lines you've marked as `RISKY` are required by the standard to work (if `uint64_t` exists of course) – M.M Jun 04 '20 at 20:39
-6

This is a question that can be answered with a few lines of code.

#include <stdio.h>

main()
{
    int          x;
    unsigned int y;

    x = -1;
    printf("\n0x%08x", x);

    y = (unsigned int) -1;
    printf("\n0x%08x", y);

}

Running this code on Eclipse/Microsoft C compiler produces:

0xffffffff
0xffffffff

A similar program can show you what uint64_t produces.

Finally, if you understand how computers use 2's complement numbers to add numbers, then you will understand that -1 for words of any number of bits (8, 32, 64, etc.) is always all ff's for every byte in the word/double word, etc.

JackCColeman
  • 3,777
  • 1
  • 15
  • 21
  • 7
    This is not the type of question that can be answered by "test and see". – R.. GitHub STOP HELPING ICE Aug 18 '13 at 21:25
  • 2
    For the record, the standard does not guarantee 2's complement for signed integer types and it does not guarantee that converting to a signed type preserves the least significant bits. These are both implementation-defined. The 2's complement representation and the conversion by dismissal of the most significant bits are common implementation choices. – Pascal Cuoq Aug 18 '13 at 21:32
  • 1
    Since when do the computers that we use DON'T use 2's complement for doing arithmetic? You are reading too much into the language spec. – JackCColeman Aug 18 '13 at 21:40
  • 5
    I think the question is about “guarantee[s] in the C standard”, but I may be reading too much into it. – Pascal Cuoq Aug 18 '13 at 22:01
  • @PascalCuoq, I would suppose that a well written compiler implements the standard and that this is the "guarantee" being alluded to. – JackCColeman Aug 18 '13 at 22:08
  • 1
    @R.. — Hmm. If the results are positive (no ambiguity), then "test and see" certainly won't help prove the case. But all it takes is *one* negative result from a single compiler in order to prove that `(uint64)-1` isn't guaranteed to yield `0xFFFFFFFFFFFFFFFF`... so in theory it's worth trying, on the off chance that it fails. This of course is completely different from whether or not the standard specifies whether it should work. – Todd Lehman Aug 08 '15 at 04:31
  • @Jack: Yes, compilers typically implement the standard, but the standard leaves several things implementation-defined, and many other things are Undefined Behaviour. C++ is not Java **Usually something will happen, but there's no guarantee the same thing will happen on other implementations.** e.g. [unsigned conversion in C works as expected on x86 but not ARM](https://stackoverflow.com/q/60925860). See also [What Every C Programmer Should Know About Undefined Behavior](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html). So has a whole tag for [tag:undefined-behavior] – Peter Cordes Mar 30 '20 at 10:28
  • @PeterCordes you make it sound like inconsistent behavior is unavoidable, the reality is that the differences are because people are imperfect and inconsistent, there is an old adage that it takes 20 years to "have" or "make" a good operating system, that is, it takes that long to resolve inconsistent undefined behaviors, having said all that why accept less than completely consistent, well defined programs – – JackCColeman Apr 01 '20 at 03:49
  • I think you still haven't understood the point that C *intentionally* allows implementations to make different choices on some things, and *intentionally* leaves behaviour undefined for some cases so compilers can make faster code that only has to work for cases that are well-defined. So basically the answer to your "why accept this?" is performance, and portability to different CPUs where different things are efficient. One obvious example is that writing outside array bounds is Undefined Behaviour; ISO C doesn't say anything about layout of local vars next to each other. – Peter Cordes Apr 01 '20 at 04:08