7

I need to deal with bignum calculation (addition and subtraction, but I treat subtraction as equivalent to signed addition) on RISC-V and the situation is a bit complicated. What I gather from half an hour of internet research:

  • RISC-V operations do not provide means to check for carries or overflow
  • This decision is motivated in the fact that flags or other means of handling it add a lot of complexity to Out-of-order micro architectures.
  • Instead, they recommend doing branches afterwards
    • For unsigned addition, overflow handling can be done with a single bltu.
    • Same for signed addition if the sign of one of the operands is known
    • Otherwise, two checks need to be performed (three additional instructions)
  • People On The Internet are raging furious about this (I won't link it here)

As far as I can tell, the branches indeed cover most scenarios rather well, except for one: (signed) bignum addition. Because there, we hit the slowest check path in a hot loop.

I know only a little about ISA design, but why didn't they include an instruction that calculates (a + b) >> 32 (effectively the carry out)? A bit like how the multiplication instruction is split into mul and mulh as well. This would allow to do the desired calculation with always two instructions. More powerful micro architectures could then even detect the sequence and only do one addition.

Am I missing some tricks that would make this instruction obsolete (or be equivalent to it)? Does it have any major downsides that I oversee? I did not find a lot of good documentation on this general topic.

piegames
  • 975
  • 12
  • 31
  • FWIW, Waterman's PhD thesis doesn't even mention carry and then only discusses the difficulties of a flags register in the context of virtualization and not for out-of-order. Intel solved that out-of-order 'problem' back in 2005. In defense of RISC-V, bignum probably doesn't show up on their benchmarks. – Olsonist Feb 05 '22 at 19:55

1 Answers1

7

add / sltu gives you sum and carry-out: https://godbolt.org/z/Y7f5dzj1P shows GCC using it for unsigned math: sum=a+b / carry = sum<a. Or for __builtin_uadd_overflow

But the problem with that is lack of ILP: the sltu can't start until the add result is ready. That could be solved if you could get carry-out directly from the inputs as you propose; good point. Of course fusion of add/sltu would also solve that problem; perhaps that's what the architects had in mind.

I don't see any CPU-design challenge in creating an instruction that produced a 0 or 1 output according to the carry-out from a 2-input addition. That would be very easy; any way of building a 32 or 64-bit adder to support the add instruction could easily produce a carry-out signal from the high bit. In fact that's probably what sltu reads, since it's normal for an integer ALU to use a single binary adder-subtractor, with NOT of one input and a carry-in of 1 to implement subtraction. (Low bit is a full-adder instead of half-adder, otherwise a normal binary adder.)


The other major problem for bignum of more than 2 reg-widths is doing add with carry-in (on ISAs with a carry flag and add-with-carry instruction).

And even worse, getting carry-out from that 3-input addition. (Either part of which could wrap, so it's not possible AFAIK to combine it into one add and compare. This is a common pitfall of pure-C implementations of adc; comments on that linked answer have working C, but it doesn't compile very efficiently).

Unless there's some trick I'm not aware of, I assume that's what really has people upset with no-FLAGS designs like RISC-V and MIPS for Bignum stuff.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thank you. The bit about carry in is a really good point. I thought "well now I have zero/one in my register and am done" but nope. I see that other architectures have an `adc` (add with carry) instruction. Would having one help RISC-V? How would the carry check look like in that case? – piegames Feb 05 '22 at 16:36
  • @piegames: Yes, it would help significantly, but it's a 3-input instruction so it would break the entire pipeline design. (Unless there are other 3-input instructions where all 3 register operands are inputs, as well as the first being an output. Like a conditional-select perhaps, like x86 `cmov` / AArch64 `csel` / MIPS `movn`: https://godbolt.org/z/xW5dj7Pdn ) – Peter Cordes Feb 05 '22 at 16:45
  • 1
    I see. I know that RISV-V has a three-input instructions (called R4 type), but it's uncommon to the point that some embedded CPUs don't support it *at all*. But even if we had such an instruction, how would the carry check look like? Doesn't the carry in introduce a new edge case that requires an additional check? – piegames Feb 05 '22 at 16:57
  • 1
    @piegames: Oh yes, right, not sure it would help significantly without another instruction to get carry-out instead of sum from those 3 inputs. (Which you could certainly provide if you're providing an adc) – Peter Cordes Feb 05 '22 at 17:02
  • 2
    This is also really bad for cryptographers trying to do elliptic curve cryptography implementations -- branching after adding / multiplying prevents the code from being constant time and may reveal information over side-channels. – Chris Beck Jan 28 '23 at 01:20