6

Using an Arduino, I have to write a function in Atmel AVR Assembly for my computer science class that converts a signed 8-bit byte to a signed 16-bit integer. I am not allowed to use any branching instructions either (but skips are fine).

I know it's wrong, but this is all I have gotten so far:

.global byteToInt
byteToInt:
  sbrc r24, 7
  ldi r25, 1
  asr r25
  ret

Does anyone know how I would make this function work? Any help would be much appreciated!

  • 1
    Which format for signed numbers is the teacher implicitly assuming? Two's complement? – Margaret Bloom Nov 29 '16 at 08:32
  • 1
    @MargaretBloom: AVR's sign-flag looks like it's set based on 2's complement interpretation of the bits, so I think that goes without saying. Also the [NEG instruction](http://www.atmel.com/webdoc/avrassembler/avrassembler.wb_NEG.html) does 2's complement negation. – Peter Cordes Nov 29 '16 at 09:18
  • @PeterCordes Eh, Maybe or maybe not. It's reasonable to assume two's complement of course but an exercise could be about implementing other signed representations (exactly because of the lack of support from the ISA). I assumed two's complement too though. – Margaret Bloom Nov 29 '16 at 09:23
  • @MargaretBloom: That's fair, although it seems the ISA lacks support for 2's complement sign-extension, so it still doesn't have a one-instruction answer the way it would on x86 (MOVSX or CBW). – Peter Cordes Nov 29 '16 at 09:25

1 Answers1

3

Well obviously you need to copy the sign bit of the char to every bit in the upper half. On most architectures, the straightforward would be to copy a register and arithmetic-right-shift it by 7. But AVR only has a shift-by-1 instruction, so we can't efficiently do that.

Another trick for conditionally getting 0 or -1 into a register is subtract-with-borrow a register from itself to get 0 - C. e.g. sbc r25, r25.

Now we just need a way to set the Carry flag if the 8-bit number is negative, i.e. if it's > 127 when treated as an unsigned integer, because C is always set based on the unsigned interpretation of things. AVR has a compare-immediate instruction, CPI, but it only works for r16-r31, not for low registers. Also, it sets the C flag opposite of what we really want, so we'd have to use another instruction to invert the result. So I think we're better off doing the compare the other way against a value in a register:

; Most efficient way, I think:
sign_extend:
    ldi   r25, 127       ; can be hoisted out of loops, and any reg is fine.

    cp    r25, r24        ; C = (r24 < 0)
    sbc   r25, r25        ; r25 = (r24 < 0) ? -1 : 0
    ; result in r25:r24

Even better, if you need to do this in a loop, you can keep 127 in a different register.

With CPI, you'd do this:

; slightly worse: only works with r16-r31, and worse in loops
sign_extend:
    cpi   r24, 127        ; C = (r24 < 128U) = ((signed)r24 >= 0)
    sbc   r25, r25        ; r25 = (r24>=0) ? -1 : 0
    com   r25             ; ones-complement negation: 0 : -1

Or, to avoid a restriction on which register is used, do the compare the other way:

I've never worked with AVR, so I'm just basing this off the instruction set reference manual that google found (and my knowledge of asm for other ISAs, like x86 and ARM). According to those docs, all of those instructions are 1 word (2 bytes), with 1 cycle latency. This is better than what gcc4.5 does:


The usual way to find good instruction sequences is to ask a compiler AVR gcc4.5 -O3 on godbolt does this:

short sign_extend(signed char a) { return a; }

sign_extend:
    mov r18,r24     ;; IDK why gcc uses r18 and r19.

    clr r19
    sbrc r18,7
    com r19

    mov r25,r19
    ret

So it zeros R19, then uses SBRC to conditionally execute a logical-not (COM), depending on the sign bit (bit 7) of R18.

I'm not sure what the extra MOVs are for. I'm also not sure why it inverts the zero instead of setting all bits with no input-dependency. (e.g. ldi r19, $FF, or the SBR alias for it. If an out-of-order-execution AVR ever existed, then that would be more efficient. :P

I'm not sure what the MOV instructions are for. SBRC is non-destructive. So AFAICT, a valid implementation would be

sign_extend:
    clr   r25
    sbrc  r24,7
    ldi   r25, $FF
    ret

This is still worse than CP / SBC, because SBRC takes 2 cycles if the skip is taken.


I assume that SBC's "false dependency" on the old value of R25 isn't a thing on AVR. On out-of-order x86 CPUs, only AMD recognizes sbb eax, eax as being independent of the old value of eax, and depending only on flags. Intel CPUs just run it normally. (They do recognize instructions like xor eax,eax as independent, and it's the standard zeroing-idiom for x86.)

So on non-AMD CPUs, if the last code that wrote EAX did so with a load that missed in cache, or something else high-latency, sbb eax, eax couldn't execute even if the flags were ready (i.e. from an independent dependency chain). But on AMD CPUs, it would start a new dependency chain for EAX.

Anyway, I assume AVR is a fairly simple in-order pipelined design, so there's no way for an old register to be a performance land-mine unless the code that did (for example) a cache-miss load into it never used the result. (Even in-order pipelines don't need to wait for high-latency operations until something uses the result.)

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thank you so much! I was having trouble getting a -1 into the register, I hadn't thought of doing a subtract-with-borrow after a comparison; this clears things up immensely. – stealthbomber10 Nov 29 '16 at 10:16
  • 2
    @stealthbomber10: Yeah, it's neat when tricks I know from one architecture (x86) are useful on another (this answer). x86 even has instructions to [set a register to 0 or 1 based on a flag](http://www.felixcloutier.com/x86/SETcc.html), but they only work on 8-bit partial registers, and are 0/1 not 0/-1. (Of course a DEC instruction turns 0/1 into -1/0, that's another useful trick). One of my pet peeves is that AMD didn't change SETCC to write a full 32-bit register in 64-bit mode (with the usual implicit zero-extension to 64 bits); that would have been more useful and efficient. – Peter Cordes Nov 29 '16 at 10:20
  • 1
    @stealthbomber10: Margaret posted an answer that used `lsl r24` to get the sign bit into CF, then `ror r24` to put it back. She deleted it after deciding it wasn't as good a technique as a compare, but it's a good example of the fact that there are usually multiple ways to get a flag set according to a bit. (even if it has to be a specific flag, like C). – Peter Cordes Nov 29 '16 at 10:23