On Cortex-A processors (AArch64 mode) is there some rule of a thumb for optimization for speed? Like it's always better to read from memory, than do a branch?
Consider the simplest conversion to hexadecimal string as example:
convert:
. . .
cmp x9, 9
b.le . + 8
add x9, x9, 0x07
add x9, x9, 0x30
strb w9, [x10, -1]!
. . .
b convert
vs
convert:
. . .
ldrb w9, [x11, x9] ; x11 - ptr to alphabet string: "0123456789ABCDEF"
strb w9, [x10, -1]!
. . .
b convert
Thanks in advance for any tips.