-2

I am trying to optimize the following block of mips code and am struggling to find ways to reduce overall instructions or other ways of optimization. Thanks!

   .data
      Str: .asciiz "ABC"
      Strempty: .asciiz "   "
    
    .text
    .globl main
    main:
        jal strcpy              #Jump to the strcpy subroutine
        li $v0,4                #Code for printing strings
        la $a0,Strempty         #syscall to print the content of the copied string
        syscall
        li $v0,10               #Terminates the program
        syscall
    
    strcpy:
          la $a0,Str            #Get address of string y
          la $a1, Strempty      #Get address of string x
          addi $sp,$sp,-4       #Adjust stack pointer to make room for doubleword
          sw $s5,0($sp)         #Push $s5 -  i.e push contents of $s5 to the top of the stack
          add $s5,$zero,$zero   #i=0
    
    L1:
            add $t0,$s5,$a0     #$t0 = addr of y[i]
            lbu $t1,0($t0)      #t1 = y[i]
            add $t2,$s5,$a1     #$t2 = addr of x[i]
            sb $t1,0($t2)       #x[i] = y[i]
            beq $t1,$zero,L2    #Branches out when the null character is reached
            addi $s5,$s5,1      #i=i+1
            j L1                #Next iteration for L1
    L2:
            lw $s5,0($sp)       #restore $s5
            addi $sp,$sp,4      #pop the doubleword from the stack pointer
            jr $ra              #return to main
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Have you looked at glibc's implementation? Similar to [Why does glibc's strlen need to be so complicated to run quickly?](https://stackoverflow.com/q/57650895) if the src and dst are aligned the same relative to each other, you can go a word at a time with some clever bithacks. (MIPS doesn't have hand-written asm for MIPS strcpy, only a few functions like memcpy and strcmp: https://code.woboq.org/userspace/glibc/sysdeps/mips/. It uses the "portable" C for other functions, compiled by GCC.) – Peter Cordes Feb 17 '22 at 22:02

1 Answers1

1
  1. Don't use $s5 — use $t4 or $v0 instead, these are free to use there without the save/restore.  $s5 is a call preserved register, whereas the $ts and $vs are call clobbered (so go ahead and clobber one).  In fact, you should use registers $a0 and $a1 as the parameter registers.

  2. Don't use the stack at all — this is a leaf function and doesn't require any stack space given there are registers immediately available to use.

  3. Use pointers instead of indexing — try to do a pointer version in C first so you can follow it, but the gist is that indexing requires repeated addition, and using pointers instead will reduce that a bit, from 3 additions in the loop to 2 of them.  The pointer version will not have need of counter/index variable i.  Modify $a0 and $a1 in place; and eliminate i.

  4. Don't load your parameter values inside the function — let them come as passed by main.

  5. As @Peter says, broadly speaking if we have an unconditional backward branch at the end of the loop and a conditional forward branch somewhere inside the loop that terminates it, we can use the conditional branch to accomplish the looping, and thus eliminate the backward branch.  This will become obvious after getting rid of the index increment as the sequence will be:

    beq $t1, $0, L2
    j L1
L2:

        And that sequence can be inverted: change the beq into bne and the branch target label from L2 to L1, then remove the j L1.  That conditional branch will both perform the exit test and branching backwards to continue the loop.

Erik Eidt
  • 23,049
  • 2
  • 29
  • 53
  • 1
    Also, rotate the loop so a `bnez` can be at the bottom, avoiding a `j`. [Why are loops always compiled into "do...while" style (tail jump)?](https://stackoverflow.com/q/47783926) Actually since we want to store the terminating zero byte, this doesn't require rotating the loop, just checking it after the store. With no unrolling or software-pipelining, it's pretty simple. – Peter Cordes Feb 17 '22 at 22:10