0

I am trying to code a simple if-else statement in assembly however, the ret returns to the start routine instead of the intended test routine.

How to fix this? Thank you.

start:
    ldi r16, 0
    call test
    rjmp start

test:
    cpi r16, 0
    breq doFirst;

    cpi r16, 1
    breq doFirst;

    cpi r16, 2
    breq doSecond;

    jmp test;

doFirst:
    inc r16;
    ret;

doSecond:
    inc r17
    ret;

2 Answers2

1

The breq instruction does not store a return address on the stack, so ret would not return to that point in the program. You would need to use the call, icall, or rcall AVR instructions.

Actually, a better solution would be to use brne to just skip over the conditional code if the condition is false. doFirst doesn't need to be a subroutine if it's only called once.

You can try compiling some if statements with avr-gcc and look at the assembly to see how the compiler does it.

David Grayson
  • 84,103
  • 24
  • 152
  • 189
  • I actually went down the rabbit hole looking at this function (see my answer :P). `doFirst` modifies `r16`, so `inc r16` actually runs twice if the loop is entered with `r16 == 0`. (Or it would with proper call/ret). I assume this was just a silly example that wasn't supposed to do anything meaningful, but it was a fun exercise in messing around with AVR. Lack of a branch on `C|Z` flags (i.e. a lower or equal) makes it hard to reuse flags from one `cpi` sometimes. – Peter Cordes Jun 08 '18 at 07:55
0

ret is basically pop into PC.

You jumped to doFirst instead of calling it, so you should jump back with rjmp to a label inside test instead of ret.

Or if doFirst was a separate function, you made an optimized tail-call to it, so it returns to your caller. (Like how you (or a compiler) would implement int foo() { return bar(); })


You don't need to make doFirst a separate function. You can inline it once, because you don't need it to return to 2 different places in your test loop.

test:
    cpi    r16, 0
    breq   doFirst0;
    cpi    r16, 1
    breq   doFirst1;
 donefirst:

    cpi    r16, 2
    brne   noSecond   ;  conditionally SKIP the inlined doSecond body
     inc   r17        ; inlined body of doSecond
  noSecond:
    rjmp   test       ; You don't need a slow 4-byte JMP, just a short relative jump

doFirst0:
    inc   r16
doFirst1:          ; fall through instead of jumping back and finding that r16 == 1 now
    inc   r16
    rjmp  donefirst

Note: this optimization would change behaviour for r16 == 0 if we had both breq instructions jump to the same place. We'd only increment once and then run the rest of the loop, instead of incrementing, then checking and finding r16 == 1 and incrementing again.

Really we might as well do if(r16 <=0) r16 = 2;, with ldi r16, 2

If it was too big to duplicate, and you really did want a rcall / ret to a helper function, you can conditionally jump over a rcall instruction.


Using 2 CPI/BREQ pairs is a very inefficient way to test if r16 <= 1 (unsigned). AVR has a BRLO (BRanch if LOwer (unsigned)).

Since BR instructions are faster if not taken, we'll leave that code out of line, instead of inlining it (like I did for doSecond) and jumping over it with BRSH (Same or Higher).

test:
    cpi    r16, 1
    brlo   doFirst    ; 
 donefirst:           ; r16 will become 2, so we can't put this label later and skip the next check

    cpi    r16, 2
    brne   noSecond   ;  conditionally SKIP the inlined doSecond body
    inc    r17
  noSecond:           ; this label might as well be at test: directly
    rjmp   test

doFirst:         ; rare case, only runs the first 2 iters, put it out of line
    ldi   r16, 2      ; 1 or 2 inc instructions make r16 = 2
    rjmp  donefirst

This whole loop doesn't make much sense unless an interrupt handler can modify or use r16 or r17, though. If that's not the case, you'd really want to just peel the first iteration and then fall into an infinite loop that does nothing.

When we enter your loop (if your rets went where intended) with r16 <= 2, we end up with r16 = 2 and r17 incremented once on that first iteration.

test:
    cpi    r16, 2
    brhi   above2
    ldi    r16, 2          ; result of 1 or 2 inc instructions
    ; flags may differ from your version.

  infloop_r16_eq_2:        ; loop for the r16==2 case
    inc    r17
    rjmp   infloop_r16_eq_2

  above2:

  infloop:                 ; loop for the r16!=2 case
    rjmp   infloop

But if the registers are global variables that could be modified asynchronously, we can't just check r16 once and then continue forever.

I was curious what gcc would do, so I used register volatile unsigned char a asm("r16"); and "r17". Surprisingly this partially works, although it does warn that optimization may eliminate reads and/or writes to register variables. That seems to have happened for the inc r17 with AVR gcc4.6.4 -O3, but not at -O1. gcc7.3 for x86-64 actually preserves it at -O3. Have a look on the Godbolt compiler explorer, and see also How to remove "noise" from GCC/clang assembly output?.

// x86-64 and AVR both have r14 and r15, but not r16/r17
register volatile unsigned char a asm("r14");
register volatile unsigned char b asm("r15");

void test() {
    while(1) {
        if (a <= 1) 
            a++;        // or  a = 2;
        if (a == 2)
            b++;
    }
}

AVR gcc -O1 output:

test:
.L7:
        cpi r16,lo8(2)
        brsh .L2
        subi r16,lo8(-(1))   ; inc r16
.L2:
        cpi r16,lo8(2)
        brne .L7             ; back to the top
         ; else fall through
        subi r17,lo8(-(1))   ; inc r17
        rjmp .L7             ; then back to the top

This looks pretty reasonable.

lo8(2) is just 2, I don't know why gcc emits assembly like that. It's probably useful for symbol addresses or something, like lo8(test) to get the low byte of a label address.

This is sort of a loop tail duplication optimization, but one of the tails is empty. So instead of skipping over an inc r17, we just jump straight to the top of the loop.

The brne .L7 could jump to the brsh instruction, because flags are still set from cpi r16, 2. Gcc doesn't do that because we told it the registers were volatile, so it doesn't optimize away a 2nd read of the register.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847