How are the carry and overflow flags used to calculate multiplication in x86

Question

How are the two flags used to correctly calculate the answer when two numbers that are multiplied overflow the register?

E.g. if al holds 0xff and is multiplied by 0x2, causing an overflow into ax, how do the flags help with this?

On x86/x64, the `MUL` instruction produces a result that is twice as wide as the source operands. — Nayuki, Dec 16 '16 at 22:37
If you use [MUL](http://www.felixcloutier.com/x86/MUL.html) or IMUL (the one-operand version), they already produce an output with twice the width of the inputs. If you multiply by two with a shift, then the bit shifted out of the top of AL is shifted into CF. But then you should have just shifted AX instead of AL. So using CF for extended precision multiply by 2 is only useful on operands that don't fit in the widest register. Or you could `shl al, 1` / `setc ah` if AH wasn't zero to start with. But `shl ax, 1` / `and ax, 0x1F` would avoid partial-register stalls. when reading AX later. — Peter Cordes, Dec 16 '16 at 22:39
The flags do help a little in determining whether you got an overflow. Nothing you couldn't check yourself, though. — Jester, Dec 16 '16 at 22:40
There's no overflow, AX has the full result 0x01FE (or 0xFFFE if you use IMUL). The CF and OF flags are set if you use MUL, and not set if you use IMUL. — Ross Ridge, Dec 16 '16 at 22:43
@RossRidge: IMUL [sets CF and OF if the upper-half is not the sign-extension of the low half](http://stackoverflow.com/questions/38253541/c-unsigned-long-long-and-imulq/38254324#38254324) (even the multi-operand versions work that way, so detecting unsigned wraparound when using 2 or 3 operand imul can't be done from just the flags) — Peter Cordes, Dec 16 '16 at 22:47
Helium: are you asking about extended-precision multiplies? Like multiplying two `int64_t` inputs in 32-bit mode, to produce an int64_t result? [As you can see (on godbolt)](https://godbolt.org/g/tIjA98), none of the resulting instructions have flag inputs. You'd only need an add-with-carry to produce the upper 64 bits of the full 128-bit result (still on a 32-bit machine). (see [this answer](http://stackoverflow.com/a/40795629/224132) for that) — Peter Cordes, Dec 16 '16 at 22:52
@PeterCordes There's no overflow when you use the byte sized, one operand, IMUL to multiply 0xFF by 0x02 (-1 * 2 = -2) and it clears CF and OF. — Ross Ridge, Dec 16 '16 at 22:55

score 3 · Answer 1 · edited May 09 '21 at 13:58

3

Multiplication on x86/x64 never overflows when using the one operand form.
This is because mul and its sibling imul produce an output twice as wide as their operands¹.
In your example, multiplying by al produces an output in ax and no overflow is generated.

The CF and OF are set when the result cannot fit in the operands size.
This can be used to perform multiplication with saturation, for example:

;Unsigned

mul ebx
sbb edx, edx      ;EDX = CF replicated along all the 32 bits
or eax, edx       ;EAX = 0ff..ffh if overflow, EAX*EBX otherwise

;Signed (perhaps not the most efficient way)

imul ebx
cmovc eax, 7fffffffh  ;Signed positive max if overflow.   (CMOV-immediate doesn't really exist, but imagine register sources)
cmovnc edx, 0         ; don't modify EAX for the non-overflow case.
sar   edx, 31         ; EDX = all-ones if overflow && negative
xor   eax, edx        ; if negative && overflow
                      ;    flip 7fffffff (INT_MIN) to 80000000 (INT_MIN)
                      ; else xor with 0 is a no-op

(Current Rust compilers also implement a.saturating_mul(b) for i32 and i64 using FLAGS from imul, but with different setup: https://rust.godbolt.org/z/ab3jMjzbv)

However to implement a multi-precision multiplication, say 64x64-bit, those flags are not needed, in fact, denoting with 2³² with k we have:

(a·k+b) × (c·k+d) = a·c·k² + (a·d+b·c)·k + b·d

where the 32-bit products produce 64-bit results that are added as below

.----.----.
|   b·d   |
'----'----'
       +
     .----.----.
     | a·d+b·c |
     '----'----'
            +
          .----.----.
          |   a·c   |
          '----'----'

 =

.----.----.----.----.
|    128-bit result |
'----'----'----'----'

¹ And this suffices to prevent overflow.

edited May 09 '21 at 13:58

Peter Cordes

328,167
45
605
847

answered Dec 17 '16 at 00:44

Margaret Bloom

41,768
5
78
124

1

Some forms of the imul instruction do not produce a product that is twice as wide as the operand(s). – Alexey Frunze Dec 17 '16 at 10:46
@AlexeyFrunze You are absolutely right! What a silly mistake, updating. Thanks for pointing that out! – Margaret Bloom Dec 17 '16 at 14:09
`7fffffffh | 80000000h` is `-1`, not INT_MIN. :/ (Also `cmov` doesn't take an immediate operand, but we can pretend that's shorthand.) Perhaps `cmp ?,?` / `sbb eax,0` to turn `7fffffffh` into `80000000h` if you can come up with something to compare that sets CF appropriately? Hmm, `cmov eax, 7f...h` / `bt edx, 31` / `sbb eax, 0` could work. – Peter Cordes May 08 '21 at 18:33
Semi-related: [C unsigned long long and imulq](https://stackoverflow.com/q/38253541) trying to do overflow-detection on 2-operand `imul` used for unsigned operands. – Peter Cordes May 08 '21 at 18:36
@PeterCordes Good points! This question is so old I should review it entirely. But I'm afraid I'm not that committed to this site :) I'll leave this tab open, so if tomorrow I've some free time I'll edit in your observations. – Margaret Bloom May 08 '21 at 20:29
@MargaretBloom: Since I had already been thinking about it, on 2nd look I realized my previous comment's idea is broken, too (need to avoid modifying a non-overflowed negative low half). But I had another idea, so I updated your answer with something that's good enough to make the point about how CF/OF are set, even if it's not an efficient saturating-imul – Peter Cordes May 08 '21 at 20:50
Saturating signed multiply seems generally hard. Rustc uses even more instructions, although with better ILP: https://rust.godbolt.org/z/ab3jMjzbv. Two `cmov`s, one to prepare LONG_MIN / LONG_MAX according to the XOR of signs of the inputs, then `cmovo` on the imul result. – Peter Cordes May 09 '21 at 12:31
@PeterCordes I like how `rustc` does it. Very straightforward and with good ILP. Thanks for finding that out. – Margaret Bloom May 09 '21 at 13:37

score 1 · Answer 2 · edited Dec 17 '16 at 12:33

The best way to answer questions like this is to read the manual.

Now I am going back to my paper copy of an 80188/186 (I don't know where my 8088/8086 programming manual is). But even back then, it is like folks are saying. The text says

If the source is a byte, then it is multiplied by register AL, and the double length result is returned in AH and AL.

And that is all fine: you can't overflow, as folks are saying, but folks don't normally write high-level language code that uses a result twice the size of the operands. It goes further to say:

If the upper half of the result is non-zero CF and OF are set; otherwise they are cleared.

So there you go: if you are doing an 8-bit operation and the result does not fit in 8 bits (0xFF * 2 = 0x1FE), then both CF and OF are set. At least, they were 20 years ago. It should be quite trivial to do an experiment to see what your current x86 does. You could have crafted this experiment before posting here, walk through every combination of operands and see what the flags do for each.

My question was more about what those flags are used for once set after the instruction and if they were used to determine the result. — some_id, Dec 17 '16 at 15:28
the flags tell you that the upper 8 bits of the result are non-zero meaning the lower 8 bits overflowed, you cannot use that 8 bit result 0xFF * 0x02 != 0xFE — old_timer, Dec 19 '16 at 14:53

How are the carry and overflow flags used to calculate multiplication in x86

2 Answers2

Linked

Related