When evaluating two alternatives to solve a problem, comparing clock-cycles and latencies, if both evaluate roughly the same, is there a better way to decide which to use?
Example - Converting to Hexstring
An example I was looking at involves converting an integer value to hex for output. There are two basic approaches:
- use a simple lookup of the hex-digit from a string
"0123456789abcdef"
usingldr
(roughly 3 clock-cycles); or - compare the remainder with
10
and either add'0'
or'W'
which involves acmp
and then two conditional adds (e.g.addlo
andaddhs
) which at roughly 1 clock-cycle each is again about 3 clock-cycles.
(using the rough latencies from the link in the answer from Instruction execution latencies for A53 -- there apparently isn't a good a53 specific latency reference)
Example - Hex Convert Loop Alternatives
The following is code for a cortex-a53 (raspberrypi 3B):
hexdigits: .asciz "0123456789abcdef"
...
ldr r8, hexdigitadr /* load address for hexdigits */
...
hexcvtloop:
cmp r6, 0 /* separation of digits done? */
beq hexcopy /* copy tmp string to address */
udiv r0, r6, r7 /* divide by base, quotient in r0 */
mls r2, r0, r7, r6 /* mod (remainder) in r2 */
mov r6, r0 /* quotient to value */
/* alternative 1 - ASCII hexdigit lookup */
ldrb r2, [r8, r2] /* hexdigit lookup */
/* alternative 2 - add to obtain ASCII hexdigit */
cmp r2, 10 /* compare digit to 10 */
addlo r2, r2, '0' /* convert to digits '0'-'9' */
addhs r2, r2, 'W' /* convert to 'a'-'f' */
strb r2, [r5], 1 /* store in tmp string */
b hexcvtloop
Understanding the reference stated clock-cycles do not account for other factors, interrupts, memory speed, cache-misses, etc..
If my rough estimates of about 3 clock-cycles each for either the hex-digit lookup with ldr
or the cmp
, addlo
, addhs
for adding to the remainder is fair, is there another consideration that would decide between the two approaches, or is it basically personal preference at that point?
(I'm not overly concerned with getting a cortex-a53 specific answer, but am more interested in if there are other ARM general metrics I would look to next -- or if it's just "up to you" at this point)