25
/* [1] */
int i = -1;
unsigned u = (unsigned)i;

/* [2] */
int i = -1;
unsigned u;
memcpy(&u, &i, sizeof i);

/* [3] */
int i = -1;
unsigned u = *(unsigned *)&i;

In order to bit-copy a signed integer to its unsigned partner, [1] should work on most machines, but as far as I know it is not guaranteed behaviour.

[2] should do exactly what I want, but I want to avoid the overhead of calling a library function.

So how about [3]? Does it efficiently achieve what I intend?

sleske
  • 81,358
  • 34
  • 189
  • 227
  • 3
    Serious question - what makes this question worthy of a downvote? – mathematician1975 May 05 '15 at 08:49
  • 4
    [1] is well defined. Whether it satisfies your requirement of a "bit copy" is a different matter. – juanchopanza May 05 '15 at 08:49
  • 5
    I would expect, although it's of of course not guaranteed, that many compilers will strength-reduce the `memcpy()` call. It would be my first choice I think, since it very clearly communicates intent (although I would feel safer with `sizeof u`, the destination's size is more important). – unwind May 05 '15 at 08:52
  • How about creating an intermediate `void pointer` variable. That doesn't only force the program to interpret it. But it also shows your intentions quite clearly(in my humble opinion :) ) – laurisvr May 05 '15 at 09:44
  • 1
    Just out of curiosity: Why do you need this? Is there some practical use, or are you just asking out of interest? – sleske May 05 '15 at 12:41
  • @xiver: It's not a good idea to ask about multiple languages at once. At least if you do, make that very clear. Thank you. – Cheers and hth. - Alf May 05 '15 at 14:05
  • @sleske From a random number generator function which returns a random signed integer of its whole range, I want to copy its bit content directly to a variable of the unsigned pair type. –  May 05 '15 at 15:32
  • 1
    @xiver77 if you don't know the format of a signed number, then you can't be sure the pmf of your rng will be uniform (or whichever pmf you prescribe) since different bit values can yield the same integer value (specifically for ones compliment and sign magnitude) – Steve Cox May 05 '15 at 16:44

4 Answers4

10
/* [4] */
union unsigned_integer
{
  int i;
  unsigned u;
};

unsigned_integer ui;
ui.i = -1;
// You now have access to ui.u

Warning: As discussed in the comments, this seems to be okay in C and Undefined Behaviour in C++, since your question has both tags I'll leave this here. For more info check this SO question:

Accessing inactive union member and undefined behavior?

I would then advise for reinterpret_cast in C++:

/* [5] */
int i = -1;
unsigned u = reinterpret_cast<unsigned&>(i);
Baum mit Augen
  • 49,044
  • 25
  • 144
  • 182
Drax
  • 12,682
  • 7
  • 45
  • 85
  • @Cheersandhth.-Alf I didn't mention it because i wasn't aware of that. AFAIK this is exactly equivalent to a bit copy, no ? What exactly is undefined ? – Drax May 05 '15 at 08:58
  • 1
    The access of a union member that was not (part of) the last one given a value. – Cheers and hth. - Alf May 05 '15 at 09:00
  • @Cheersandhth.-Alf If I recall correctly, this was well defined in newer standards (C99 onwards?). Doesn't help with portability with older compilers though. – user694733 May 05 '15 at 09:01
  • That would mean the whole `union` concept is undefined =/ `reinterpret_cast` is the way to go then. If that's not undefined either :) – Drax May 05 '15 at 09:03
  • @Drax: `reinterpret_cast` is the only one that's guaranteed to work per its definition, and without any overhead at all. – Michael Foukarakis May 05 '15 at 09:07
  • 1
    Also, this usage is not undefined behavior. It may, however, lead to UB if the value read is a trap representation. Summing up: Type-punning is actually legal in C89, C99 (after TC3), and C11. – Michael Foukarakis May 05 '15 at 09:12
  • 2
    **0** Removing my downvote since the answer has been updated. Still, I think the upvotes are misplaced, because there's no advantage to go through such contortions with a `union`. – Cheers and hth. - Alf May 05 '15 at 09:22
  • @MichaelFoukarakis: When the result can blow up, it's not well defined. A well defined result doesn't blow up. And when it is not well defined, and is not formally "unspecified" (to be defined by each compiler), then it is Undefined. It's that simple. – Cheers and hth. - Alf May 05 '15 at 09:48
  • Type punning is well specified. Usage of trap representations is a different issue. – Michael Foukarakis May 05 '15 at 09:51
  • `reinterpret_cast`'ing to a reference - what does that do? I usually see it done with pointers. – David G May 05 '15 at 10:07
  • @0x499602D2 [cppreference](http://en.cppreference.com/w/cpp/language/reinterpret_cast) says `An lvalue expression of type T1 can be converted to reference to another type T2. The result is an lvalue or xvalue referring to the same object as the original lvalue, but with a different type. No temporary is created, no copy is made, no constructors or conversion functions are called. The resulting reference can only be accessed safely if allowed by the type aliasing rules`. One of the rules is : `T2 is the (possibly cv-qualified) signed or unsigned variant of the dynamic type of the object` – Drax May 05 '15 at 10:15
  • @0x499602D2 I think overall it is roughly equivalent to `reinterpret_cast`ing a pointer then dereferencing the result. – Drax May 05 '15 at 10:16
  • 1
    @Drax: worth noting that "The resulting reference can only be accessed safely if allowed by the type aliasing rules" is not there in the C++11 standard's paragraph about this, §5.2.10. It is entirely the *interpretation* that the author who contributed to that passage subscribed to. It is a common interpretation in the gcc/g++ sphere, and it *applies to that compiler*, but it is not specified by the standard. – Cheers and hth. - Alf May 05 '15 at 10:40
  • @Cheersandhth.-Alf Man, you definitely know too much for your own good, worth upvoting that comment to make it standout, thanks ;) – Drax May 05 '15 at 11:13
  • @Michael: Actually, for C89, this is *not* defined behavior. "Casting" through union isn't covered by the type punning rules for regular casts. According to §3.3.2.3: `if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined.` But there's one exception: `If a union contains several structures that share a common initial sequence, and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them`. – Erlend Graff May 10 '15 at 10:02
  • To make it well-defined in C89, the struct members `i` and `u` would have to be wrapped in different structs, however odd it may seem. Then, because `i` and `u` form the initial sequence of each corresponding struct, and they have compatible types (they only differ in sign), `u` could be read after writing to `i`. – Erlend Graff May 10 '15 at 10:06
  • I suggest you read http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm , which addresses your concerns. The behaviour is the same in C89 through C11. – Michael Foukarakis May 10 '15 at 11:35
  • @Michael: I'm aware of DR #283 and #257, and I still believe (IMHO) that this behavior changed with C99 and is therefore not the same for C11 and C89, as per the wording in DR #257 (this has also been discussed in a [previous question on SO](http://stackoverflow.com/a/11640603/2646573)). Although the intention might have been the same for all C standards, I still believe the wording in C89 to specify type punning through unions as implementation-defined. In any case, the DR addresses C99 and cannot be applied to C89 retrospectively. – Erlend Graff May 11 '15 at 14:55
  • Yes, C89 3.3.2.3 explicitly allows it in the first sentence, therefore it is defined behaviour, and therefore legal as I said in my first comment. I don't see why we're continuing this discussion, really. – Michael Foukarakis May 11 '15 at 17:06
  • @ErlendGraff: The relationship between the pointer yielded by the unary "&" operator and the lvalue identified thereby is stronger in C89 than in later standards. A C89 implementation might be allowed specify that "direct" accesses to signed and unsigned members of a union object have looser semantics than accesses via pointers, but it would be rather weird. For the most part, C89's semantics are stronger than those of C99 (or at least gcc's interpretation thereof) since it requires implementations to document any cases where writing an unknown-provenance pointer of one type... – supercat Jan 04 '18 at 17:54
  • ...and reading another might have an effect other than reinterpreting the bytes in the data (if the pointers could identify members of the same union object, C89 required implementations to behave as though they might, thus making behavior at worst Implementation-Defined). C99 by contrast specifies some impractical cases where behavior is defined as byte-based reinterpretation, but then replaces all other cases to UB. – supercat Jan 04 '18 at 17:58
9
/* [1] */
int i = -1;
unsigned u = (unsigned)i;

↑ This is guaranteed to not work on a sign-and-magnitude or 1's complement machine, because conversion to unsigned is guaranteed to yield the signed value modulo 2n where n is the number of value representation bits in the unsigned type. I.e. the conversion is guaranteed to yield the same result as if the signed type used two's complement representation.


/* [2] */
int i = -1;
unsigned u;
memcpy(&u, &i, sizeof i);

↑ This would work nicely, because the types are guaranteed to have the same size.


/* [3] */
int i = -1;
unsigned u = *(unsigned *)&i;

↑ This is formally Undefined Behavior in C++11 and earlier, but it's one of the cases included in the "strict aliasing" clause in the standard, and so it's probably supported by every extant compiler. Also, it's an example of what reinterpret_cast is there for. And in C++14 and later the language about undefined behavior has been removed from (1)the section on lvalue to rvalue conversion.

If I did this I would use the named C++ cast for clarity.

I would however try out what the sometimes look-the-standard-allows-me-to-do-the-impractical-thing compilers have to say about it, in particular g++ with its strict aliasing option, whatever it is, but also clang, since it's designed as a drop-in replacement for g++.

At least if I planned on the code being used with those compilers and options.


1) [conv.lval], §4.1/1 in both C++11 and C++14.

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
  • 1
    The third option is not undefined behavior, as it is listed as an exception. – Siyuan Ren May 05 '15 at 11:18
  • At least in C, the third example is well-defined. According to C89 §3.3.16.1: "If the value being stored in an object is accessed from another object that overlaps in any way the storage of the first object, then the overlap shall be exact and the two objects shall have qualified or unqualified versions of a compatible type; otherwise the behavior is undefined". And the standard is quite clear that types which only differ in qualifier or sign are compatible types. – Erlend Graff May 05 '15 at 11:22
  • @ErlendGraff: Thank you. Sorry, I answered only for C++. It's annoying with these multiple-language questions. Fixed. – Cheers and hth. - Alf May 05 '15 at 14:00
  • @SiyuanRen: Please specify the language and relevant paragraph of the standard. For C++, note that the so called strict aliasing clause (which aims to enumerate cases of guaranteed UB) is not referenced by the section on `reinterpret_cast`. – Cheers and hth. - Alf May 05 '15 at 14:03
  • 1
    @anonymous downvoter: Please do explain your downvote so that the answer can be improved, or so that others can ignore the vote. Thank you. – Cheers and hth. - Alf May 05 '15 at 14:04
8

This is from Paragraph 4.7 "Integral Conversions" of document N3797, the latest working draft of the C++14 standard:

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]

To a first approximation, all computers in the world use two's complement representation. So [1] is the way to go (unless you are porting C++ to the IBM 7090).

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
TonyK
  • 16,761
  • 4
  • 37
  • 72
6

[3] is correct in both C and C++ (as of C++14 but not previously); there is no need to use memcpy in this case. (That said, there's no reason not to use memcpy, as it communicates your intent effectively, is obviously safe, and has zero overhead.)

C, 6.5 Expressions:

7 - An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [...]

  • a type that is the signed or unsigned type corresponding to the effective type of the object, [...]

C++, [basic.lval]:

10 - If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined: [...]

  • a type that is the signed or unsigned type corresponding to the dynamic type of the object, [...]

As you can see, the wording in the two standards is very similar and so can be relied upon across the two languages.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • C++11 §4.1/1 " If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has **undefined behavior**" And I see you referred to this in [your own answer](http://stackoverflow.com/questions/11373203/accessing-inactive-union-member-undefined) to another question. With the union technique we have an object of the right type but no initialization, with the `reinterpret_cast` we have initialization but no object of the right type. – Cheers and hth. - Alf May 05 '15 at 14:34
  • @Cheersandhth.-Alf sure, that's why [conv.lval] was fixed for C++14; in C++11 that section is inconsistent with [basic.lval]. – ecatmur May 05 '15 at 14:41
  • Thanks, I did not know. I think it's worth pointing out (valid in C++14 and onwards) in the answer. I'll fix mine. – Cheers and hth. - Alf May 05 '15 at 14:52
  • @Cheersandhth.-Alf [DR 616](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3714.html#616) was where this was changed, btw. Thanks for pointing me to my old answer - it's certainly interesting to see how things have changed even in a "minor" revision of the standard! – ecatmur May 05 '15 at 14:55