How conversion from int32 to int64 works

Question

My question is: How does the conversion from int32 to int64 work? I'm not interested in functions that do that, I'm interested in HOW they do that. What happens at the bit level?

I'm particularly interested how negative int32 (ie. -23) is converted into int64.

By sign-extension. Read about integer representation (2's complement for dixed-width types). — too honest for this site, Feb 13 '16 at 19:54
"The usual way", that is, the same way as extending a signed `short` or `char` to a larger type. There are not that many ways to do so - I can actually only think of one, which @Olaf mentions. — Jongware, Feb 13 '16 at 19:56
@Jongware: not necessarily. The basic types are not necessarily 2's complement. — too honest for this site, Feb 13 '16 at 19:57
Why is this "too broad"? Surely, the answer is pretty straight-forward, even if we have to accept that there COULD BE machines that do things differently, the general principle of "convert the short type to a longer one"? (Although I expect someone has covered this in a different answer, I'd like to have a go at writing an answer) — Mats Petersson, Feb 13 '16 at 20:13
@MatsPetersson Although I voted to close, I did consider this offtopic. and broad in the sense that in theory it could be handled differently depending on architecture. But given that the bulk of architectures handle it generally the same way, I'll vote to reopen. I couldn't find a specific question like this, although sign extension is usually part of a broader question. I'll give you a chance to answer. — Michael Petch, Feb 13 '16 at 20:44
@MatsPetersson: voted to reopen, and also downvoted. It's not too broad, it's just too trivial, IMO. Although maybe if you didn't know the phrase "sign extend", you'd have a hard time finding the right answer. In x86 asm, the relevant instruction is `movsxd r64, r/m32`. The phrase "sign extension" comes up all over the place in asm documentation. — Peter Cordes, Feb 13 '16 at 21:37
@PeterCordes : the OP doesn't care about the instructions, just *how* the bits are manipulated at the lower level. And to answer that question is potentially architecture dependent, although in most cases architectures generally handle it about the same (but that isn't to say that you can't have an architecture with a completely different mechanism for dealing with integers). The latter point why I consider the question on the broad side. If the OP might have stipulated a particular architecture then that wouldn't be an issue. — Michael Petch, Feb 13 '16 at 22:11
@MichaelPetch: Hmm, I could have phrased that better. It did sound like I was suggesting that was an answer to the OP's question. I only mentioned `movsxd` as a specific thing to search on that would lead the OP to an explanation of what happens to the bits in x86. i.e. copying the sign bit of the 32bit integer into all positions in the upper 32, because x86 uses two's complement integers like everything else. As far as "too broad", C99 requires one's complement, two's complement, or sign+magnitude. Conformant C implementations would be slow on crazy architectures with weird hardware. — Peter Cordes, Feb 13 '16 at 22:15
I'm not convinced yet it's worthy of reopening. At least remove the langauge tags C and C++ - unless there is something in their (respective!) specifications on what "happens on the bits level". — Jongware, Feb 13 '16 at 22:57
Whilst I agree that "what happens on bits level" can be tricky to cover for EVERY architecture in existance, but one could of course explain that by saying something like "If we have an architecture of this type, this is what happens, and whilst the exact bit pattern is different, a similar type of operation is done on other representations". I'm not sure how many 1s complement machines are out there, and I still really think this question is worth answering. In fact, one could explain four of the number representations in https://en.wikipedia.org/wiki/Signed_number_representations — Mats Petersson, Feb 13 '16 at 23:12

Mats Petersson · Answer 1 · 2016-02-14T08:37:26.180

The answer as to what happens in a sign extension depends on the way that signed integers are represented. There are several ways that can be used to represent negative numbers in binary form. The most common variants are:

Two's complement. By far the most common.
One's complement. Sometimes used in DSP systems, I believe.
Sign and magnitude.
Excess K. Used in floating point exponents.
Base -2.

There are most likely another few variants, but since number 1 in that list probably covers at least 99% of the computer systems the readers here are going to encounter, and the other four will be the vast majority of any remaining systems (in fact, I have only ever encountered Excess K in "real life", the others are in my 30+ years of working with computers, still theoretical). The last two forms are not allowed in the C99 standard, so it's unlikely [but it's not absolutely forbidden to write a C compiler for a target system that is non-conformant - you just can't claim that it supports the full C99 standard]

I will now explain how a sign extend will work in each of these number systems. I do NOT intend to explain how each system works in other respects, or why one would pick a particular system, where they are used, or something else like that. That probably belongs in a "computer architecture book". For further details on different number systems, there is an article in Wikipidea here.

I will give examples of how a two 4-bit numbers convert to their respective 8-bit versions - to save typing lots of ones and zeros, the principle is the same, just more digits.

Two's complement:

Take bit 31 (the sign bit) and copy it into the bits 32..63. Most 64-bit processors have a specific instruction to perform this step, so that the conversion from 32- to 64-bit is automatic.

The value 5 is 0101 as a four bit number, copy bit 3 -> 00000101. -6 in 4 bits is 1010, copy bit 3 -> 11111010.

One's complement:

Identical to two's complement - the only real difference is that there is a negative zero, formed by 64 "ones".

The value 5 is 0101 as a four bit number, copy bit 3 to bit 4..7-> 00000101. -6 in 4 bits is 1001, copy bit 3 to bit 4..7 -> 11111001.

Sign and magnitude

Move bit 31 into bit 63. The rest of the extended digits remain zero.

The value 5 is 0101 as a four bit number, copy bit 3 to bit 7 and replace with a zero -> 00000101. -6 in 4 bits is 1110, copy bit 3 to bit 7, and replace with zero -> 10000110.

Excess K

Since excess K is a "biased" representation, sign extension means normalizing and then re-biasing the number with the larger constant. In other words, subtract 2³¹ and add 2⁶³ [or "add 2⁶³-2³¹"].

The value 5 is 1101 as a four bit number, subtract 8, then add 128 -> 10000101. -6 in 4 bits is 0010, subtract 8, then add 128 -> 01111010.

(This is often used in the exponent of floating point numbers - I believe that, together with having the sign in the top-most bit, allows all "regular" floating point numbers to be compared as a 32- or 64-bit integer and still behave as expected - but do not use this in your programs!).

Base -2

This is a really weird one, as the value is -2^value, which, if you follow the link above, you'll see is not at all trivial to follow. Odd numbered bits are negative, even numbered bits values are positive. However, the sign extension is trivial - just add zeros to the relevant level, since each bit is has a sign or not based on being even or odd.

The value 5 is 0101 as a four bit number, fill with zeros to extend: 00000101. -6 in 4 bits is 1110, fill with zeros gives 00001110.

Note that this question is still tagged C / C++. You might want to mention that [ISO C99 requires two's complement, one's complement, or sign/magnitude](http://stackoverflow.com/questions/13652556/what-happens-when-i-assign-long-int-to-int-in-c#comment18743414_13652796). From that comment: [see section 6.2.6.2](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf). Anyway, +1 especially for the other two representations. — Peter Cordes, Feb 14 '16 at 01:48
[ARM load/store instructions](http://www.peter-cockerell.net/aalp/html/ch-3.html) apparently encode the displacement with a sign/magnitude representation, because the sign bit is still the sign bit when it's a register offset rather than an immediate. RE: comparing floats as integers: yes, IEEE floats sort in correct order as signed integers. I'm not sure about denormals, and you're right that this is somewhat brittle. Don't depend on IEEE FP representation if you don't gain a big speedup and can be sure it's safe. Plus it's weird for humans reading your program. — Peter Cordes, Feb 14 '16 at 01:57
Update on comparing float bit patterns as integers: IEEE floats compare correctly as sign+magnitude integers, thanks to their biased exponents. [If using two's complement hardware comparisons, everything works except when both floats are negative: in that case, the ordering is reversed](https://en.wikipedia.org/wiki/IEEE_754-1985#Representation_of_numbers). Also, +NaN values compare > +Inf. -NaNs sort between 0 and -Inf (or below -Inf, farthest from 0, if you do reverse the order when both integers are negative). Still, this design apparently allows sharing some HW between int and FP cmp. — Peter Cordes, Feb 16 '16 at 01:55

How conversion from int32 to int64 works

1 Answers1