Bubble Sort in MASM x86 Not Sorting After Few Interations

Question

So, I am trying to implement BubbleSort, using this code as template:

    int n = arr.length;  
    int temp = 0;  
     for(int i=0; i < n; i++){  
             for(int j=1; j < (n-i); j++){  
                      if(arr[j-1] > arr[j]){  
                             //swap elements  
                             temp = arr[j-1];  
                             arr[j-1] = arr[j];  
                             arr[j] = temp;

However, my assembly code only sorts the first 1 - 2 times and the produces an erroneous result. I tried running the debugger, stepping through multiple times, but my naive eyes could not spot any mistakes in the translation.

.data
arr DWORD 3,2,1,4,5
temp DWORD 0
arr_j DWORD 0
; Bubble Sort
.code
main proc
mov esi, OFFSET arr
mov eax, 0 ; for outer loop
mov ebx, 1 ; for inner loop

OuterLoop:

     InnerLoop:

        ; swap elements
        ; referencing j in array
        call MULTIPLY
        add edx, esi ; edx = esi + 4*ebx that is *arr[j]
        mov edi, [edx]
        mov [arr_j], edi ; store arr[j]
        sub edx, 4
        mov edi, [edx] ; store arr[j - 1]

        cmp edi, [arr_j] ; if(arr[j-1] > arr[j]) -> swap elements
        jle FAIL_SWAP

        ; swap elements here
        mov [temp], edi
        mov edi, [arr_j]
        mov ebp, [temp]
        mov [edx], edi ; arr[j - 1] < arr[j]
        add edx, 4
        mov [edx], ebp

        FAIL_SWAP:

     inc ebx
     mov ecx, LENGTHOF arr
     sub ecx, eax
     cmp ebx, ecx
     jl InnerLoop

inc eax
cmp eax, LENGTHOF arr
jl OuterLoop     


invoke ExitProcess,0
main ENDP

MULTIPLY PROC ; multiply 4 with ebx, store at edx
    push esi

    xor esi, esi
    mov esi, 1

    mov edx, ebx

    LOOPER:
    add edx, ebx

    inc esi
    cmp esi, 4
    jl LOOPER

    pop esi
    ret
MULTIPLY ENDP
END main

Any help is really appreciated. Thanks.

don't use a memory location for `temp`, that's just crazy inefficient and makes it harder to see what's going on. Just load two registers, then store them back to opposite locations. Look at *optimized* compiler output for a C version of your code. (Like `gcc -O1`). Here's a clean and fairly efficient MASM bubble sort: [Bubble sort in x86 (masm32), the sort I wrote doesn't work](https://stackoverflow.com/q/17802947), and another one: [Assembly bubble sort swap](https://stackoverflow.com/q/11497966). (Also see this nasty [code-golf version](https://codegolf.stackexchange.com/a/149038)). — Peter Cordes, Jul 09 '18 at 03:14
@PeterCordes Thanks for the heads up. I used the temp trying to `translate` the pseudocode to ASM. Any ideas where the problem lies in my code? I am really trying to figure out what my error is :(. Thanks — Jishan, Jul 09 '18 at 03:16
And BTW, what the heck are you doing writing a function to multiply by 4? `lea edx, [esi + ebx*4]` can replace the call + add. Or at least use a shift to multiply if you insist on doing it separately. — Peter Cordes, Jul 09 '18 at 03:17
@PeterCordes Damn! Did not know that. I am following IRVINE's ASM book and I am at the chapter Conditional Processing and it has nothing on multiplications and shifting as they will be introduced later. Self-learning sucks ! — Jishan, Jul 09 '18 at 03:20
While `lea edx, [esi + ebx*4]` works for `arr[j]`, how do I reach `arr[j-1]`? I believe `lea edx, [esi + ebx*4 - 1]` is invalid? — Jishan, Jul 09 '18 at 03:22
No, IDK what's wrong with your code. I haven't yet got past the ridiculous overcomplications to see what it's actually doing. Exactly *what* erroneous result do you get? Which branch is taken when it should be not-taken, when you're single-stepping? "doesn't work" isn't a [mcve]. IDK if the bug is obvious or not; I'm not interested enough in wading through the overcomplicated code to find it. (Ah, just saw your last comment; that explains it I guess, if you haven't got to the part about addressing modes. This is why I suggested looking at compiler output as a way to learn.) — Peter Cordes, Jul 09 '18 at 03:22
Given the pointer to one element in EDX, the previous one is `[edx-4]`. Or avoid LEA entirely and use `mov edi, [esi + ebx*4 ]` / `cmp edi, [esi + ebx*4 - 4]`, or load into `edx` and compare regs (because you're no longer using `edx` for `&arr[j]`). Or just always increment pointers, like in the implementations I linked to. Get used to thinking in terms of loops like `int *p = arr; do { p++ /* add esi,4*/ } while ( p < endp ); )` — Peter Cordes, Jul 09 '18 at 03:24
@PeterCordes Thanks, I get this now. I guess I am simply missing too many constructs. Probably I will finish other chapters and they forage into implementing algorithms. — Jishan, Jul 09 '18 at 03:26
Yup, I'd recommend that. If you already know C and know what bit shifts are, but not exactly how they're done in asm, then skim ahead and / or spend some time looking at compiler output. See [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116). (BTW, I updated my last comment to add a pointer-increment loop example). — Peter Cordes, Jul 09 '18 at 03:31
I *highly* recommend watching Matt Godbolt's CppCon2017 talk [“What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”](https://youtu.be/bSkpMdDe4g4). This isn't some random youtube video; it's an *expert* giving a talk for an audience that includes asm novices (but not programming-in-general novices) at a C++ conference. As an expert myself, I highly recommend it. — Peter Cordes, Jul 09 '18 at 03:32
@PeterCordes Great resource! Thanks! I will watch it today itself! — Jishan, Jul 09 '18 at 14:49

score 2 · Accepted Answer · answered Jul 22 '18 at 21:39

int n = arr.length;  
  int temp = 0;  
    for(int i=0; i < n; i++){  
      for(int j=1; j < (n-i); j++){
        ...

This template code already has an error. The outer loop does 1 iteration too many!
Consider a 2-element array where n is 2. The complete bubblesort will consist of a single comparison, yet the outer loop will do 2 iterations (i=0 and i=1). Clearly wrong.

mov eax, 0 ; for outer loop
mov ebx, 1 ; for inner loop
OuterLoop:
    InnerLoop:

The 1st time that your InnerLoop runs all is well, but the 2nd time it starts with BX=5 because that's the value BX got at the end of the loop. You need to reset the BX register to 1 every time the InnerLoop starts running.

mov eax, 0 ; for outer loop
OuterLoop:
    mov ebx, 1 ; for inner loop
    InnerLoop:

A simpler solution uses EAX as a decrementing counter:

mov eax, LENGTHOF arr
dec eax
OuterLoop:
    mov ebx, 1 ; for inner loop
    InnerLoop:
        ...
        inc ebx
        cmp ebx, eax
        jbe InnerLoop
    dec eax
    jnz OuterLoop

These will be the values for AX and BX for a 5 element array:

AX   BX
4    1 to 4
3    1 to 3
2    1 to 2
1    1 to 1

The rest of the code although correct, is overcomplicated and inefficient. See the comments by Peter Cordes.

The extra outer-loop iteration in C isn't actually *wrong*, because the inner loop will run zero times in that case. But if translated into `do{}while()` loops in asm that assume they run at least one iter, then it's wrong! And writing it so it can't be translated to efficient asm is a problem. — Peter Cordes, Jul 22 '18 at 22:25

Bubble Sort in MASM x86 Not Sorting After Few Interations

1 Answers1