2

Let's examine Add and Multiply as examples. Which one can be categorized as widening?

Assume inputs are signed characters (i.e 8 bits in length) in 2's complement, unless declared otherwise.

Positive number addition

1+2 = 3

This operation doesn't seem to be widening. 1, 2 and 3 all fit in a character.

However, 250 + 6 overflows an unsigned char. So is this widening?

Same could be done with signed type, 125 + 5 overflows signed char into the sign bit. Is this widening?

Negative number addition

-2-3 = -5

This overflows a character in binary by 1 bit:

   1 1 1 1 1 1 1 0
+  1 1 1 1 1 1 0 1
------------------ 
 1 1 1 1 1 1 0 1 1 

The overflow is normally discarded, however, is this considered a widening operation?

Positive number multiplication

1 * 2 = 2

Are all multiplications widening, even if the result still fits in the original data type?

The above example, 2 still fits in an 8-bit character however, if I do the math by hand in binary, extra 0s get appended to left of the result which get discarded.

Negative number multiplication

-2 * -3 = 6

For simplicity, let's assume our inputs are 4 bits. This overflows by twice its size as expected, i.e the result becomes 8 bits:

1110 * 1101 = 10110110

The upper 4 bits are discarded and the lower 4 bits equal 6 as expected.

Is this considered a widening operation?

Division

I haven't gone into detail about division but I assume it's considered a narrowing operation.

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
Dan
  • 2,694
  • 1
  • 6
  • 19

2 Answers2

6

Widening or not is also a property of an assembly instruction that does a math operation. The choice of what instructions to provide is informed by the inherent properties of the math operations, though, e.g. that multiplication can produce wider results.


Mathematically, the sum or difference of two n-bit numbers needs at most n+1 bits to represent exactly. e.g. unsigned 3-bit or two's complement 4-bit 7+7 = 14, i.e. 0b1110 unsigned or 0b01110 two's complement.

Most CPUs only provide a non-widening implementation of add/sub, with the extra bit going into the carry flag if there is one. (Some ISAs like MIPS don't have FLAGS at all, so if you want carry-out you have to recover it from carry = (a+b) < a (unsigned compare)).


Mathematically, the product of two n-bit numbers takes up to 2n bits to represent exactly. e.g. signed 8-bit -128 * -128 = 16384 SCHAR_MIN squared (i.e. 0x80 * 0x80 = 0x4000, note the sign bit being 0 in the result, but the bit right below that being set). Or the unsigned 8-bit 255 * 255 = 65025 i.e. 0xff ^ 2 = 0xfe01.

Many CPUs do provide a widening form of multiplication, e.g. x86's mul r/m32 and imul r/m32 that write the product to EDX:EAX.

x86 also has non-widening imul r, r/m32 in case you don't care about the high half, e.g. implementing C semantics for unsigned * unsigned which produces an unsigned result, truncating the full result. (Note that the low half is the same regardless of interpreting the inputs as signed or unsigned, only the high half differs. That's why x86 doesn't bother to provide a similar mul r, r/m32 instruction)

Some ISAs, instead of providing one widening multiply, provide separate low-half and signed-high-half / unsigned-high-half instructions. (Running them back-to-back may let some implementations fuse them into a single widening multiply internally, but simpler implementations could just run them separately, and software may just use one or the other.)

Some ISAs may not even provide a way to get the high half at all, e.g. ARM Cortex-M0 microcontrollers only have 32b x 32b => 32b multiply, no umull. Using that as a building block for wider multiplies (like 64x64 => 128-bit) may mean breaking down the inputs into 16-bit chunks to apply the usual extended-precision techniques in terms of 16b x 16b => 32b multiplies.


Mathematically, integer division can produce a quotient at most the width of the dividend. For extended precision purposes, it's useful to support a wider dividend (see Why should EDX be 0 before using the DIV instruction?), but this capability isn't normally used by compilers. (Especially not on ISAs like x86 where 64b / 32b => 32b division will fault if the quotient doesn't fit in 32 bits.)

But yes, if an ISA has a division instruction with a wide dividend, like x86's implicit EDX:EAX for div and idiv, that's called narrowing division.

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thank you, so an overflow (i.e int addition overflow) is not widening, it's just an overflow. – Dan Jul 20 '21 at 18:40
  • 2
    @Dan: If you do it with an instruction like `add eax, ecx` then yes that's true. The mathematical full non-truncated result is wider than a register in the signed-overflow or unsigned-carry-out case though. But usually you'd handle that in asm by sign- or zero-extending the input *before* addition, to avoid wrapping. – Peter Cordes Jul 20 '21 at 19:39
  • 1
    @Dan: If you're doing single-bit operations separately to implement binary addition, then you need an extra output bit to hold the result without losing anything. This is actually a useful idea if you're doing 256 separate bits in parallel with SIMD, for example the Harley-Seal technique for popcount (https://github.com/WojciechMula/sse-popcount/blob/6feb3dba32c526b17de01e931c116900e3a23104/popcnt-harley-seal.cpp#L21) where you add bits vertically for a few steps before doing `8 * popcnt(eights) + 4 * popcnt(fours) + ...`. How many steps you can do without overflow depends on how many vars. – Peter Cordes Jul 20 '21 at 19:45
  • Hi Peter, I wanted to ask, does the above rules on add/sub/mul/div also apply to Floating points or only apply Integers? I know that FP representation is fully different from regular integers so was wondering if rules such as `the sum or difference of two n-bit numbers needs at most n+1 bits to represent exactly` or `widening rules` apply to FP as well? – Dan Jul 28 '21 at 19:46
  • I'm guessing same rules may apply to the Mantissa but not the Exponent? – Dan Jul 28 '21 at 20:45
  • @Dan: The FP mantissa is an integer, but during FP addition you don't just add the mantissas. You shift each incoming mantissa according to the exponent so they have the right relative place values. e.g. the leading (most significant) mantissa bit of `2^-100` is about 76 bits to the right of the LSB of the mantissa of `1.0`, if doing `1.0 + 2^-100` in single-precision float. That's why the rounded FP result is still exactly 1.0. Consider other cases like float epsilon or 2.0+1.0. – Peter Cordes Jul 28 '21 at 21:07
  • @Dan: FP *multiplication* does mulitply the mantissas (and take the high half) while adding the exponents. A hypothetical float x float => double could keep all the mantissa product bits (and still have some low zeros because 53 > 2x24), instead of rounding to 24 bits for float x float => double. In real life, if you want that operation you have to do float->double conversion on both inputs and a normal double multiply, because CPUs don't provide float x float => double multiply. – Peter Cordes Jul 28 '21 at 21:10
  • Thanks a lot, so we could say the rules may not apply in total, and may apply partially based on the operation? Does division behave similar to multiplication in this case? – Dan Jul 28 '21 at 21:22
  • @Dan: Understanding how basic integer operations work is a useful/necessary *part* of understanding FP operations. e.g. widening addition is why 1.5+1.5 has to get renormalized with a higher exponent. Re: division, I think so. Of course the result of any FP operation gets rounded to a fixed number of mantissa bits. (Subtraction of two nearby numbers can have "catastrophic cancellation", leaving few significant bits, i.e. the low bits being zeros. Remember that FP uses sign/magnitude, not 2's complement mantissa) – Peter Cordes Jul 28 '21 at 21:29
  • by `widening addition` you are referring to our discussion above, the sign bit overflow correct? While we are at it I wanted to ask this, when doing subtraction in integers you can get the 2s complement of the input and do an `add` instead, does the same apply FPs? – Dan Jul 28 '21 at 21:37
  • @Dan: The carry-out, yes. It's not a sign bit, FP mantissas are unsigned. Subtraction: yes, you can transform addition from `x-y` into `x + (-y)`, but that's not how real-world hardware ALUs work. (Especially for integer subtraction that has to update FLAGS, the borrow output (in the carry flag) would be wrong if you actually did it this way. [Arithmetic identities and EFLAGS](https://stackoverflow.com/q/62218921)) – Peter Cordes Jul 28 '21 at 21:41
  • Ah I see so I guess they do it with the `borrowing` method explained here: https://stackoverflow.com/questions/46570941/subtracting-1-from-0-in-8-bit-binary. When you say `Most CPUs only provide a non-widening implementation of add/sub, with the extra bit going into the carry flag` isn't that technically widening? the extra bit overflowing into the sign bit is wider than the inputs by 1 bit? or would it be widening only if it grows bigger than the data type? – Dan Jul 28 '21 at 21:46
  • 1
    @Dan: Yes, it captures the full result in CF:reg, for CPUs that have a carry flag. But the full result isn't all in a single register so I wouldn't call it "widening" in the same way that multiply instructions are widening. – Peter Cordes Jul 28 '21 at 21:59
  • For some architectures tho, the full result does go in a single register, it just overwrites the sign bit and sets a flag in a condition register at the same time, but i guess that still isn't `widening` the same way multiplication is? – Dan Jul 28 '21 at 22:09
  • @Dan: What ISA are you talking about where something like `add dst, src,src` works with the destination reg being at least 1 bit wider than the sources, to hold the full result as well as updating FLAGS? AArch64 can take one narrow source and sign-extend it (`add x0, x0, w1, sxtw` - https://godbolt.org/z/rnPM3GMKb), but can't do `add x0, w1, w2, sxtw` – Peter Cordes Jul 28 '21 at 22:17
  • I will have to look for it again but it may have been PPC. But I do remember reading the overflow behaviour where the overflow bit goes into the sign bit and any other overflow would have been ignored, a condition code was set as well. – Dan Jul 28 '21 at 22:22
  • @Dan: How can the "overflow bit" go into the "sign bit"? That doesn't even make sense. The sign **bit** is the MSB of an integer register (and the name really only applies when talking about 2's complement, not unsigned). The sign **flag** is a flag (like x86 SF) set = MSB of the integer result (for normal operations like add). "Overflow bit" isn't a standard term, but it could mean either the signed-overflow output of an add (x86 OF), or the carry-out (x86 CF). But either way it's set according to addition results carrying out past the top of the register width, that's why it's oflow – Peter Cordes Jul 28 '21 at 22:26
  • Here is a similar statement: `When there is an overflow, the result is obtained by allowing any carry into the sign-bit position and ignoring any carry out of the sign-bit position, and condition code 3 is set.` from this patent page: https://patents.google.com/patent/AU2010355816A1/en – Dan Jul 28 '21 at 22:30
  • 1
    @Dan: The problem with what you said is the part about "where the overflow bit *goes into* the sign bit". You said "overflow *bit*", but Intel just talks about "overflow" as a separate boolean condition (e.g. which will go into OF), not a bit that can meaningfully be put into an integer register along with other addition results. And certainly not as the sign bit of an addition. – Peter Cordes Jul 28 '21 at 22:34
  • Thanks for clarifying. So as you mentioned this will not be counted at "widening" the way it does in multiplication. – Dan Jul 28 '21 at 23:17
1

Widening addition/multiplication means that the result of the operation is larger than at least one of the inputs.

Specifically in arm neon, widening operators differ from (en)longening instructions so that widening addition/accumulation adds 8 bit source to 16 bit destination, while long addition adds two 8 bit sources into 16 bit target. Same applies to 16-bit and 32-bit sources.

0001+0002==00000003 with widening.

Likewise 01*02==0002 with widening and 02 without.

In Intel x86 ISA multiplication is widening by design, producing 32-bit output to register pair dx:ax. Likewise division is narrowing with IA.

In contrast c/c++ semantics if int32/uint32 arithmetic is mostly modular, discarding the upper bits of the infinite precision result.

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57
  • 1
    x86 has non-widening multiply, too: `imul reg, reg-or-mem`, or `imul r, r/m, immediate`. https://www.felixcloutier.com/x86/imul. You're not required to destroy [er]dx:[er]ax to multiply if you don't care about the high half, unless you're only using the 8086-compatible subset of x86. (Those forms were new in 386 and 186, respectively, IIRC) – Peter Cordes Jul 20 '21 at 18:33
  • Indeed and the SSE(2) instruction set added instructions to get the top bits of long multiplication. NEON has also narrowing add and return high half. – Aki Suihkonen Jul 21 '21 at 05:53
  • High-half-only non-widening multiply only exists in SSE2 for 16-bit elements, `pmulhw` / [`pmulhuw`](https://www.felixcloutier.com/x86/PMULHUW.html) as high halves for [`pmulld`](https://www.felixcloutier.com/x86/pmullw), and also SSSE3 [`pmulhrsw`](https://www.felixcloutier.com/x86/pmulhrsw) fixed-point rounding multiply that takes a 16-bit window of bits near the top of the full result. For 32-bit elements, we have 32x32 => 64-bit widening SIMD multiplies ([`pmuludq`](https://www.felixcloutier.com/x86/pmuludq)), but not pure high-half until AVX-512IFMA52 `vpmadd52huq` and `luq` integer MAC. – Peter Cordes Jul 21 '21 at 05:58