2

Using an Arduino, I have to write a function in Atmel AVR Assembly for my computer science class that computes the 8-bit average of two 8-bit values in assembly. I am not allowed to use any branching instructions either (but skips are fine).

This is what I have so far:

.global average
average:
  add r24, r22
  asr r24
  ret

For the part of my program where I have to compute the average of 69 and 60, it returns -64 instead of 64. Does anyone know how I would make this function work? Any help would be much appreciated.

  • 3
    The trick for averaging while avoiding integer overflow / wraparound: http://stackoverflow.com/a/3816471/224132. Which I found in under a minute by searching for `integer average without overflow`, since I knew there *was* a trick, but couldn't remember it. It probably works for signed 2's complement as well as unsigned, but I didn't check. Put `signed` into the google search terms if you want. – Peter Cordes Nov 30 '16 at 17:04
  • Note that the answer I linked only works for unsigned if you know what order they're in. The highest-voted answer doesn't need that, but takes many more operations than ADD and ROR. Anyway, this just goes to show that when looking for integer tricks, don't limit yourself to AVR asm. You'll find a LOT of stuff in C, which you can implement in AVR yourself or even feed to a compiler and see how it does it. e.g. some of these are useful: https://graphics.stanford.edu/~seander/bithacks.html – Peter Cordes Nov 30 '16 at 18:30

1 Answers1

10

The trick is to add and then rotate-with-carry to divide the 9-bit result by 2 and leave an 8-bit result in a register.

Two of the answers on the question I linked in comments use that: first, second.

An AVR implementation of that is:

    add   r24, r25       ; 9-bit result in C and r24
    ror   r24            ; rotate-through-carry, like x86's RCR instruction

This works for signed or unsigned interpretation of the bits, since all we're doing is discarding the low bit from the 9-bit full result of the addition. There's no arithmetic vs. logical shift choice, and no wraparound.

Also note that division by shifting rounds towards -infinity (not truncating towards zero like C's integer division operator). So (1 + -2) >> 1 is -1.


This is small enough that you should put it in a macro, not a function. It probably takes at least 2 instructions at most call sites, so inlining this saves code size even if you could have used the 1-word RCALL instruction instead of 2-word CALL.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • interesting. So in x86 we can use `RCL` to achieve the same. Unfortunately compilers don't recognize this optimization – phuclv Dec 02 '16 at 10:56
  • @LưuVĩnhPhúc: Yeah, I don't know how to express this in C other than by casting to a larger unsigned type and then using `>>`. Probably no compiler will optimize that back to RCL for types that are wider than a register. – Peter Cordes Dec 02 '16 at 11:02
  • Even RCL by 1 is more than 1 uop on Intel (3 on Skylake), so for narrower args, ADD + SHR in a 64-bit or 32-bit register is cheaper on Intel CPUs. If only one of the inputs needs an extra instruction to zero-extend, MOVZX (or MOV) / ADD / SHR should normally beat ADD+RCL. Especially since a zero-extending MOV lets you do it non-destructively. – Peter Cordes Dec 02 '16 at 11:02