This is no different than a single loop you just nest them. Same as in C or other languages (use different variables/registers for each loop so they can nest without interference).
something holds i (register or memory)
something holds a (register or memory)
i = 0;
loop0:
get i
get a
compare i and a
if signed greater than or equal jump to loop0_done
do stuff
get i
increment i
jmp to loop0
loop0_done:
You could also architect it this way:
something holds i (register or memory)
something holds a (register or memory)
i = 0;
loop0:
get i
get a
compare i and a
if signed greater than or equal jump to loop0_done
do stuff
get i
increment i
loop0_mid:
get a
compare i and a
if signed less than jump to loop0
Let us assume that a and b are passed in as r0 and r1. We want this code to follow the normal arm calling convention.
for(int i = 0; i < a; i++){
for(int j = 0; j < b; j++)
do stuff
}
}
push {r4,r5,r6,r7}
;@ A loop
mov r4,#0 ;@ i
mov r5,r0 ;@ a
mov r7,r1 ;@ b
b a_loop_mid
a_loop:
mov r6,#0 ;@ j
b b_loop_mid
b_loop:
do stuff
add r6,#1
b_loop_mid:
cmp r6,r7
blt b_loop
add r4,#1
a_loop_mid
cmp r4,r5
blt a_loop
pop {r4,r5,r6,r7}
With the two loops looking the same just using different registers/variables:
mov r4,#0 ;@ i
mov r5,r0 ;@ a
b a_loop_mid
a_loop:
do stuff
add r4,#1
a_loop_mid
cmp r4,r5
blt a_loop
and
mov r6,#0 ;@ j
b b_loop_mid
b_loop:
do stuff
add r6,#1
b_loop_mid:
cmp r6,r7
blt b_loop
and then just nest them. now you are going to run out of registers so some things will want to be on the stack, so accesses to say b might be
ldr r1,[sp,#12]
and not needing to burn register r7.
Likewise for loops hat count to n but the count variable is not used in the loop then you can depending on the instruction set perhaps save an instruction or few. There are multiple arm instruction sets so instead of counting in and doing a
add this,#1
cmp this,that
blt somewhere
can save an instruction inside the loop
subs this,#1
bne somewhere
you need to initialize this properly so if
for(i=0;i<a;++)
and a is a 5 then I will count 0,1,2,3,4, five things. So you could instead count 5,4,3,2,1, five things
this=a
label:
...
subs this,#1
bne label
And save that instruction in the loop
Some arm instruction sets have a cbz and cbnz (compare and branch if zero or if not zero)
so
sub this,#1
cbnz label
No real savings there. Some instruction sets (not arm) have a decrement and jump if not zero
this = a
label:
djnz this, label
saving another instruction.
If you are linking with compiled code using the typical arm calling convention then r0-r3 are volatile and most of r4 up are non-volatile (you have to preserve them). This means though that if you want to call another function inside the loop then that function can mess with r0-r3 so you cannot use them as part of the loop across the call boundary.
if r0 had a in it
mov r2,#0
a_loop:
bl somewhere
cmp r2,r0
blt a_loop
This has two problems both r2 and r0 can be changed in the somewhere function so the compare and stuff can be messed up.
Doing this
mov r2,r0
a_loop:
bl somewhere
subs r2,#1
bne a_loop
only r2 is messed up for this loop but if the a value in r0 is needed after this code that may have been destroyed. Which is why for simpler things you may see this
push {r4,...}
mov r4,r0
...
pop {r4,...}
save the callers r4 but then within this function use r4 to hold the a value. Or worst case
push {r0,...}
ldr rn,[sp,??] read a
pop {r0,...}
where the sp offset is determined based on how much stuff is on the stack and where the r0 value landed. Or use a stack frame and reference off of that.
At the end of the day nested loops are no different than any other language. You have to have loop variables that don't interfere with other nested loops. And your loop lives wholly within the outer loop.