Yes, since you have to subtract '0'
anyway, do that and then unsigned compare c <= 9
or c < 10
. See What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa? for the range-check trick.
We can do this in C and then see how it compiles, as a starting point for a compact RISC-V implementation. This C is structured like the asm in NASM Assembly convert input to integer?, in hopes that GCC or clang will use similar loop structure. If you translate it by hand, you might want this kind of loop structure, or tweak it for better software-pipelining on an in-order RISC-V, especially to hide load-use latency. This loop structure is great on a modern x86 where OoO speculative exec hides branch and load-use latency.
// C intentionally written exactly like hand-written asm
// Translate this to asm by hand, including the loop structure.
// or compile it if you want more bloated asm.
unsigned str_to_uint(const unsigned char *p) {
unsigned dig = *p - '0';
unsigned total = dig; // peel first iter, optimize away the + 0 * 10
if (total < 10) // <10 can share a constant with *10
goto loop_entry;
else // fall through to the uncommon case of no valid digits
return 0;
do {
total = total*10 + dig;
loop_entry: // branch target = loop entry point
dig = *++p - '0';
} while(dig < 10);
return total;
}
I skip the total * 10 + dig
on the first iteration using a taken branch, so we might as well have that be our entry into the loop to minimize the amount of total code.
Another option would be to peel another loop iteration fall into the top of the loop. This is what GCC and clang choose, when compiling with -O3
or -O2
. With -Os
gcc de-optimizes it to a loop with a j
at the bottom and btgu
break in the middle. (Godbolt compiler explorer). I don't know any -march=
RISC-V arch or tune options to try.
So if you want a good balance of code-size and efficiency (especially for the often-common case of 1 or 2 digit numbers), you should probably "compile" it by hand.
GCC uses (x<<3) + (x<<1)
to multiply by 10; clang uses mul
(and inside the loop does share a constant between mul
and the bltu
loop branch. Unfortunately outside the loop clang compares against 9
, like 9 < total
, so it needs both constants. (Does RISC-V have a bge
<=
compare? IDK, TODO / edit welcome on whether this is a missed optimization or not).