9

I fear I may be missing something trivial, but it appears there is no actual safe way to convert to/from a signed type if you wish to retain the original unsigned value.

On reinterpret_cast, 5.2.10 does not list an integer to integer conversion, thus it is not defined (and static_cast defines no additional conversion). On integral conversions 4.7.3 basically says conversion of a large unsigned will be implementation defined (thus not portable).

This seems limiting since we know, for example, that a uint64_t should, on any hardware, be safely convertible to a int64_t and back without change in value. Plus the rules on standard layout types actually guarantee safe conversion if we were to memcpy between the two types instead of assign.

Am I correct? Is there a legitimate reason why one cannot reinterpret_cast between integral types sufficient size?


Clarification: Definitely the signed version of the unsigned is not guaranteed a value, but it is only the round-trip that I am considering (unsigned => signed => unsigned)


UPDATE: Looking closely at the answers and cross-checking the standard, I believe the memcpy is not actually guaranteed to work, as nowhere does it state that the two types are layout compatible, and neither are char types. Further update, digging into the C-standard this memcpy should work, as the sizeof the target is large enough and it copies the bytes.


ANSWER: There appears to be no technical reason why reinterpret_cast was not allowed to perform this conversion. For these fixed size integer types a memcpy is guaranteed to work, and indeed so long as the intermediate can represent all bit-patterns any intermediate type can be used (float's can be dangerous as there may be trap patterns). In general you can't memcpy between any standard layout types, they must be compatible or char type. Here the ints are special since they have additional guarantees.

edA-qa mort-ora-y
  • 30,295
  • 39
  • 137
  • 267
  • I think you can always do that, after all with reinterpret_cast you just tell the compiler how to interpret a memory location without changing location value. – Adriano Repetti Feb 27 '12 at 15:27
  • "on any hardware". Exactly that is the point. Maybe on any hardware that you could work with, but C++ is not designed for academical reasons to support some other things than 2nds complement, but because there is actually such hardware. And (u)intXX_t types are just required to behave in computations "as-if" they were 2nds complement, there is no requirement that the hardware has to be. – PlasmaHH Feb 27 '12 at 15:30
  • @PlasmaHH, my point is that `uint64_t`, `int64_t` are exact size (if supported on the hardware) and thus `memcpy` is guaranteed to convert (via the rules of standard layout types). I don't care about 2's complement here, I want a back-and-forth conversion. – edA-qa mort-ora-y Feb 27 '12 at 15:33
  • For the sake of the exercise, consider a roundtrip such as ``float`` - ``int`` - ``float``. We know about the IEEE 754 representation and that's why such a roundtrip doesn't work. On the other hand, given an ``int64_t a``, you could look at ``&a`` as if it held an ``uint64_t``, or ``double`` or whatever for that matter, no guarantees regarding how correct. – foxx1337 Feb 27 '12 at 16:13
  • @foxx1337, I understand the logical value in the intermediate is implementation defined, but the round-trip does work with `memcpy`, so why isn't it allowed with reinterpret_cast? – edA-qa mort-ora-y Feb 27 '12 at 16:16

6 Answers6

3

We know that you can't cast an arbitrary bit sequence to floating-point, because it might be a trap representation.

Is there any rule that says there can't be trap representations in the signed integral types? (Unsigned types can't, because of the way the range is defined, all representations are needed for valid values)

Signed representations can also include equivalence classes (such as +0 == -0) and may coerce values in such a class to a canonical representation, thus breaking the roundtrip.

Here's the relevant rules from the Standard (sectin 4.7, [conv.integral]):

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]

If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.

If you mean using reinterpret_cast on a pointer or reference, rather than the value, you have to deal with the strict-aliasing rule. And what you find is that this case is expressly allowed.

Community
  • 1
  • 1
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • Hmm, by that logic you shouldn't be able to `memcpy` to the floating-point value either, since it could trap. Is perhaps the memcmpy between signed/unsigned integral types actually not guaranteed? – edA-qa mort-ora-y Feb 27 '12 at 17:08
  • @edA-qamort-ora-y: `memcpy` will never trap, because it doesn't interpret its arguments. But using the resulting value might be a problem. – Ben Voigt Feb 27 '12 at 17:11
  • Do you agree though that doing a round-trip via `memcpy` is guaranteed to retain the original value? – edA-qa mort-ora-y Feb 27 '12 at 17:39
  • @edA-qamort-ora-y: If the representation isn't legal in the intermediate type, then I don't think it's guaranteed. I'm not sure whether signed integral types are allowed to exclude certain representations. Bo's comment on another answer indicates that whatever might be allowed for `signed int`, `int64_t` is guaranteed to use two's-complement representation and all representations are legal. – Ben Voigt Feb 27 '12 at 17:43
  • Yes, I noticed in the C-standard, as well as C++, back-forth conversion is only strictly defined with a `char` array as intermediary. so while it may be legal for these two types, it isn't a general guarantee, even if the size is sufficient. – edA-qa mort-ora-y Feb 27 '12 at 17:48
  • In C++, even if the implementation supports negative zeroes, `-0` isn't one. `+0 == -0` is always true not because normal zero and negative zero compare equal, but because normal zero and another normal zero compare equal. I understand that you probably merely use `-0` as descriptive, but this is not clear from your answer. –  Feb 28 '12 at 17:58
2

As you point out, memcpy is safe:

uint64_t a = 1ull<<63;
int64_t b;
memcpy(&b,&a,sizeof a);

The value is b is still implementation defined since C++ does not require a two's complement representation, but converting it back will give you the original value.

As Bo Persson points out int64_t will be two's complement. Therefore the memcpy should result in a signed value for which the simple integral conversion back to the unsigned type is well defined to be the original unsigned value.

uint64_t c = b;
assert( a == c );

Also, you can implement your own 'signed_cast' to make conversions easy (I don't take advantage of the two's complement thing since these aren't limited to the intN_t types):

template<typename T>
typename std::enable_if<std::is_integral<T>::value && std::is_signed<T>::value,T>::type
signed_cast(typename std::make_unsigned<T>::type v) {
    T s;
    std::memcpy(&s,&v,sizeof v);
    return s;
}

template<typename T>
typename std::enable_if<std::is_integral<T>::value && std::is_unsigned<T>::value,T>::type
signed_cast(typename std::make_signed<T>::type v) {
    T s;
    std::memcpy(&s,&v,sizeof v);
    return s;
}
bames53
  • 86,085
  • 15
  • 179
  • 244
  • Any insights as to why `reinterpret_cast` would not allow this conversion? – edA-qa mort-ora-y Feb 27 '12 at 15:55
  • 1
    If `int64_t` is defined, it must be a two's complement 64 bit integer, with no padding bits. So if `int64_t` is defined, the `memcpy` should work, and I've never heard of a case where a simple cast wouldn't work (although it is formally implementation defined). For the standard integral types, like `int` or `long`, it's less sure, since the number of bits and the signed representation are implementation defined (and if the implementation isn't two's complement, strange things can happen). – James Kanze Feb 27 '12 at 16:00
  • @JamesKanze Are you sure `int64_t` must be two's complement? – Christian Rau Feb 27 '12 at 16:19
  • @edA-qamort-ora-y No, I'm not sure why they didn't allow that conversion. You can use reinterpret cast to convert a uint64_t lvalue to an int64_t lvalue, and then convert that int64_t lvalue back to the original uin64_t lvalue, however. – bames53 Feb 27 '12 at 16:34
  • @James Kanze Can you provide a standards reference stating that `int64_t` must be twos complement? – Mark B Feb 27 '12 at 16:35
  • @bames53, actually you can't. This is not a defined conversion for reinterpret_cast and the compiler can reject it (gcc does at least). I was expecting this conversion to be allowed, and was surprised when it wasn't (thus the question). – edA-qa mort-ora-y Feb 27 '12 at 16:37
  • @edA-qamort-ora-y see [n3337 5.2.10 p11] you have to make sure you're requesting an lvalue: http://ideone.com/Zqvd1 - also note that copying the value of the intermediate int64_t and casting that to a uint64_t lvalue does not necessarily yield a uint64_t equal to the original. This conversion is probably useless for your purposes. – bames53 Feb 27 '12 at 16:47
  • 4
    The C++ standard refers to 7.18 of the C99 standard, which says "The typedef name `intN_t` designates a signed integer type with width N, no padding bits, and a two's complement representation." (7.18.1.1). – Bo Persson Feb 27 '12 at 17:01
  • @BoPersson Wow, I didn't know this. So we finally got two's complement into the standard. – Christian Rau Feb 28 '12 at 13:51
  • 1
    @Christian - Not really. The typedefs are optional, so if you don't have two's complement `int64_t` will just be missing. – Bo Persson Feb 28 '12 at 14:17
  • @BoPersson What, the desparately needed fixed-width types are optional? Optional `int64_t` is even worse than unspecified signedness implementation. – Christian Rau Feb 28 '12 at 15:21
  • @ChristianRau: That `uint32_t` is optional isn't a problem, IMHO, since some compilers could not implement such a type. A bigger problem is that `uint32_t` effectively represents a different type on 32-bit systems than on 64-bit systems. Given `uint32_t new_reading, prev_reading; int64_t total;`, the meaning of `total += new_reading - prev_reading;` is totally different on 32-bit systems than on 64-bit systems; there's no way for such code to specify a type whose behavior will work the same on both systems without additional typecasts. – supercat Sep 21 '15 at 19:46
  • @supercat assuming both the 64-bit system and 32-bit system support `uint32_t` and `int64_t`, and also assuming that `uint32_t` is large enough that it does not get promoted to `int`, `total += new_reading - prev_reading;` will be the same on both the 64-bit and the 32-bit system. What difference did you have in mind? – bames53 Sep 21 '15 at 23:06
  • @bames53: I should have clarified "64-bit system" as "system where `int` is 64 bits". There's no reason a good language for a 64-bit processor shouldn't have a 64-bit integer as its default numeric type, but unfortunately C has not yet defined any type which could on such a system play the role that `uint32_t` plays on 32-bit compilers. – supercat Sep 21 '15 at 23:23
  • @supercat Well on a system with 64-bit `int`s the `uint32_t` would simply be an `unsigned short` or even an extended integer type. The 'role' that `uint32_t` plays is that of an unsigned integral type of exactly 32-bits. And of course existing 64-bit systems use it just fine. – bames53 Sep 22 '15 at 01:47
  • @bames53: A lot of code needs a type that behaves as a member of the ring of integers congruent mod 4294967296. I see no reason C shouldn't define a type which, when it exists, would behave in such fashion, nor allow compilers to provide such types even when `int` is larger (with a rule that operations between a `uwrap32_t` and a signed type which would be an `int` on a system where `int` was 32 bits, should yield a `uwrap32_t` result). The present rules for balancing promotions between signed and unsigned types are worse than useless from the standpoint of writing strictly conforming... – supercat Sep 22 '15 at 14:45
  • ...programs, since they mean that a lot of typecasts are "optional", but omitting them will yield code that means totally different things on different systems, often in ways not guarded by even elevated warning levels (e.g. do you know of any compilers that will squawk at `u64 += u32a - u32b;`?) – supercat Sep 22 '15 at 14:46
1

Presumably it's not allowed because for machines with sign-magnitude representations it would violate the principle of least surprise that signed 0 maps to unsigned 0 while a signed -0 would map to some other (probably very large) number.

Given that the memcpy solution exists I assume the standards body decided to not support such an unintuitive mapping, probably because unsigned->signed->unsigned isn't as useful a sequence as pointer->integer->pointer.

Mark B
  • 95,107
  • 10
  • 109
  • 188
  • 1
    Ignore these sized types then for a moment. I would say the mapping from `intptr_t` to/from `uintptr_t` would be useful sequence. Just consider if you are trying to combine to APIs which just happened to choose different signedness. – edA-qa mort-ora-y Feb 27 '12 at 16:47
0

The issue is basically that an n-bit unsigned may not have a representation in the n-bit signed type. For example, an 8-bit unsigned has a maximum value of 256, while necessarily an 8-bit signed value can have no value greater than 128 (and note that this is reguardless of hardware implementation: any representation will require a bit for sign.)

Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
  • Ah, but that's just it, the rules for standard layout types _guarantee_ that you can convert back and forth without any loss in value. That is, it doesn't matter what logical value is stored in the signed variable, using memcpy guarantees you can get the original unsigned value back (since the type are both standard layout and of sufficient size). – edA-qa mort-ora-y Feb 27 '12 at 16:02
  • 3
    The maxes are actually 255 and 127. Just sayin'. :P – TheBuzzSaw Feb 27 '12 at 16:19
-2

Just run

#include <cstdio>
#include <stdint.h>
using namespace std;

int main()
{
    int64_t a = 5;
    int64_t aa;
    uint64_t b;
    double c;

    b = *reinterpret_cast<uint64_t *>(&a);
    aa = *reinterpret_cast<int64_t *>(&b);

    if (a == aa) {
        printf("as expected, a == aa\n");
    }

    c = *reinterpret_cast<double *>(&a);
    aa = *reinterpret_cast<int64_t *>(&c);

    if (a == aa) {
        printf("again, as expected, a == aa\n");
    }

    printf("what is this I don't even %f.\n", c); // this one should give some undefined behavior here

    return 0;
}

Couldn't fit it in a comment.

foxx1337
  • 1,859
  • 3
  • 19
  • 23
  • This is not guaranteed to work, and could segfault and still be standards compliant. `reinterpret_cast` can be used for round-trip, but you can't use the intermediate as a real pointer, as it is implementation defined. _I do not know of any hardware/compiler where this would not work however_ – edA-qa mort-ora-y Feb 27 '12 at 16:35
  • Most probably, variable c doesn't hold a valid IEEE 754 form. From that point on, nothing is guaranteed to work. Same goes for signed / unsigned though, as C++ doesn't tell anything about how integers should be packed. – foxx1337 Feb 27 '12 at 16:43
  • Besides, this violates the strict aliasing rule (aliasing an integer as a `double` and vice versa is not allowed). – Ben Voigt Feb 27 '12 at 17:06
-2

Unless I misunderstand the question, just put the signed type into an unsigned type and vice versa to go back again:

#include <iostream>

int main()
{
    signed char s = -128;
    unsigned char u = s;
    signed char back = u;

    std::cout << (int)u << std::endl;
    std::cout << (int)back << std::endl;

    return 0;
}

./a.out 
128
-128
01100110
  • 2,294
  • 3
  • 23
  • 32
  • 1
    The question is about guarantees made by the standard. Everybody fully expects the round-trip to work on common hardware, but the standard doesn't actually guarantee it. This is surprising since if you do it with `memcpy` the round-trip *is* guaranteed to work. – edA-qa mort-ora-y Feb 27 '12 at 16:52
  • Ah yes, I see now. Sorry for the noise. Interesting question, I was under the impression this was guaranteed. – 01100110 Feb 27 '12 at 16:56