ARM assembly nested loops

Question

I am new to ARM Assembly language and I know how to make a simple for loop, however, when I try to apply to concept to nested loops, I get very confused on how to set it up. I was wondering if there was a rule for setting up for loops in general? And maybe if I could use that rule to make a nested loop?

As an example, if I have a simple nested loop code written in C like below, how would that look in assembly? I'd really appreciate a detailed explanation, thank you!

for(int i = 0; i < a; i++){
  for(int j = 0; j < b; j++){
  int sum = 0;
    for(k = 0; k < c; k++){
      for( l = 0; l < d; l++){
        int temp1 = i+k;
        int temp2 = j+k;
      }
    }
  }
}

You put a counter in a register and the upper-bound in another register. Use different regs for different variables. Compiler output (https://godbolt.org/) can give you an example (although for this trivial example of the inner most loop body, it will want to optimize everything away, or else keep everything in memory (no optimization)) — Peter Cordes, Feb 10 '21 at 06:57

score 1 · Answer 1 · edited Aug 02 '21 at 20:53

for two nested loops I usually do something like this. *Keep in mind in this code i (outer loop counter) will be R0, and j (inner loop counter) will be R1. And R2 is used to count the number of loops over inner and outer loops.

            AREA        myCode, CODE, READONLY
            ENTRY
            EXPORT __main
                
__main
            
            MOV     R2, #0
            
            MOV     R0, #2      ; ** outer loop (loop 1) counter initialization ** 
LOOP1       CBZ     R0, STOP    ; if (R0 == 0 && R1 == 0) then branch to somewhere. (I'm just ending program here)
            ; between outer and inner loop
            ; could do anything but I'm just incrementing R2
            ADD     R2, R2, #1
            
            MOV     R1, #3      ; ** inner loop (loop 2) counter initialization **
LOOP2       CBZ     R1, CNTULP1
            ; code inside of inner loop will be here
            ; could do anything but I'm just incrementing R2
            ADD     R2, R2, #1

; Notice) when you need to continue loop2 => branch to CNTULP2. else if you wanted to continue loop1 => branch to CNTULP1
;           and when you wanted to break loops => branch out of them for example branch to STOP 
CNTULP2     ; continue loop2 (inner loop) 
            SUB     R1, R1, #1  ; decrement inner loop counter
            B       LOOP2

CNTULP1     ; continue loop1 (outer loop)
            SUB     R0, R0, #1  ; decrement outer loop counter
            B       LOOP1


STOP        B       STOP
            END

To count down toward zero, it's even better to use `subs r1, r1, #1` / `bne LOOP2`, instead of wasting an instruction on `cbz` at the top. (Like a do{}while() loop in C). [Why are loops always compiled into "do...while" style (tail jump)?](https://stackoverflow.com/q/47783926)). You can use `cbz` once before the loop to skip it if you can't prove the loop count will be non-zero. — Peter Cordes, Apr 24 '21 at 05:24

old_timer · Answer 2 · 2021-02-21T03:34:36.413

This is no different than a single loop you just nest them. Same as in C or other languages (use different variables/registers for each loop so they can nest without interference).

something holds i (register or memory)
something holds a (register or memory)

i = 0;
loop0:
   get i
   get a
   compare i and a
   if signed greater than or equal jump to loop0_done
   do stuff
   get i
   increment i
   jmp to loop0
loop0_done:

You could also architect it this way:

something holds i (register or memory)
something holds a (register or memory)

i = 0;
loop0:
   get i
   get a
   compare i and a
   if signed greater than or equal jump to loop0_done
   do stuff
   get i
   increment i
loop0_mid:
   get a
   compare i and a
   if signed less than jump to loop0

Let us assume that a and b are passed in as r0 and r1. We want this code to follow the normal arm calling convention.

for(int i = 0; i < a; i++){
  for(int j = 0; j < b; j++)
    do stuff
  }
}

push {r4,r5,r6,r7}
;@ A loop
mov r4,#0 ;@ i
mov r5,r0 ;@ a
mov r7,r1 ;@ b
b a_loop_mid
a_loop:

  mov r6,#0 ;@ j
  b b_loop_mid
b_loop:

    do stuff

    add r6,#1
b_loop_mid:
    cmp r6,r7
    blt b_loop

  add r4,#1
  a_loop_mid
  cmp r4,r5
  blt a_loop

pop {r4,r5,r6,r7}

With the two loops looking the same just using different registers/variables:

mov r4,#0 ;@ i
mov r5,r0 ;@ a
b a_loop_mid
a_loop:

    do stuff

  add r4,#1
  a_loop_mid
  cmp r4,r5
  blt a_loop

and

  mov r6,#0 ;@ j
  b b_loop_mid
b_loop:

    do stuff

    add r6,#1
b_loop_mid:
    cmp r6,r7
    blt b_loop

and then just nest them. now you are going to run out of registers so some things will want to be on the stack, so accesses to say b might be

ldr r1,[sp,#12]

and not needing to burn register r7.

Likewise for loops hat count to n but the count variable is not used in the loop then you can depending on the instruction set perhaps save an instruction or few. There are multiple arm instruction sets so instead of counting in and doing a

add this,#1
cmp this,that
blt somewhere

can save an instruction inside the loop

subs this,#1
bne somewhere

you need to initialize this properly so if

for(i=0;i<a;++)

and a is a 5 then I will count 0,1,2,3,4, five things. So you could instead count 5,4,3,2,1, five things

this=a
label:
...
subs this,#1
bne label

And save that instruction in the loop

Some arm instruction sets have a cbz and cbnz (compare and branch if zero or if not zero)

so

sub this,#1
cbnz label

No real savings there. Some instruction sets (not arm) have a decrement and jump if not zero

this = a
label:

djnz this, label

saving another instruction.

If you are linking with compiled code using the typical arm calling convention then r0-r3 are volatile and most of r4 up are non-volatile (you have to preserve them). This means though that if you want to call another function inside the loop then that function can mess with r0-r3 so you cannot use them as part of the loop across the call boundary.

if r0 had a in it

mov r2,#0
a_loop:
   bl somewhere
   cmp r2,r0
   blt a_loop

This has two problems both r2 and r0 can be changed in the somewhere function so the compare and stuff can be messed up.

Doing this

mov r2,r0
a_loop:
   bl somewhere
   subs r2,#1
   bne a_loop

only r2 is messed up for this loop but if the a value in r0 is needed after this code that may have been destroyed. Which is why for simpler things you may see this

push {r4,...}
mov r4,r0
...
pop {r4,...}

save the callers r4 but then within this function use r4 to hold the a value. Or worst case

push {r0,...}
ldr rn,[sp,??]  read a
pop {r0,...}

where the sp offset is determined based on how much stuff is on the stack and where the r0 value landed. Or use a stack frame and reference off of that.

At the end of the day nested loops are no different than any other language. You have to have loop variables that don't interfere with other nested loops. And your loop lives wholly within the outer loop.

I've tried to repair some mis-formatted code - search for `1,r0 ;@ a` and repair if you can. — halfer, Feb 20 '21 at 13:20

ARM assembly nested loops

2 Answers2