EDIT: added nops after branches
So think about how you would do this in C...At a low level.
unsigned int *src,*dst;
unsigned int len;
unsigned int temp;
...
//assume *src, and *dst and len are filled in by this point
top:
temp=*src;
*dst=temp;
src++;
dst++;
len--;
if(len) goto top;
you are mixing too many things, focus on one plan. First off you said you had a source and destination address in two registers, why is the stack involved? you are not copying or using the stack, you are using the two addresses.
it is correct to multiply by 4 to get the number of bytes, but if you copy one word at a time you dont need to count bytes, just words. This is assuming the source and destination addresses are aligned and or you dont have to be aligned. (if unaligned then do everything a byte at a time).
so what does this look like in assembly, you can convert to mips, this is pseudocode:
rs is the source register $a0, rd is the destination register $a1 and rx is the length register $a2, rt the temp register. Now if you want to load a word from memory use the load word (lw) instruction, if you want to load a byte do an lb (load byte).
top:
branch_if_equal rx,0,done
nop
load_word rt,(rs)
store_word rt,(rd)
add rs,rs,4
add rd,rd,4
subtract rx,rx,1
branch top
nop
done:
Now if you copy bytes at a time instead of words then
shift_left rx,2
top:
branch_if_equal rx,0,done
nop
load_byte rt,(rs)
store_byte rt,(rd)
add rs,rs,1
add rd,rd,1
subtract rx,rx,1
branch top
nop
done: