0

The problem is to swap every two variables in the array. Conditionals are not allowed.

Input: 1,2,3,4,5,6,7,8
Output: 2,1,4,3,6,5,8,7

Here's my code:

.data

arr BYTE 1,2,3,4,5,6,7,8
counter DWORD 0

.code
main proc

    xor eax, eax
    mov eax, LENGTHOF arr ; calculate length of array
    mov counter, eax

    xor eax, eax
    xor esi, esi 

    mov esi, OFFSET arr ; now esi contains address of arr

    xor ecx, ecx
    ; set ECX counter for Loop according to ECX
    mov ecx, counter
    mov counter, 0
    L1:
        xor eax, eax
        xor ebx, ebx

        mov eax, [esi]
        inc esi
        mov ebx, [esi]
        dec esi
        mov [esi], ebx
        inc esi
        mov [esi], eax

        inc esi
        dec ecx
    loop    L1

    invoke ExitProcess,0

Now instead of the expected output, I got some gibberish so I debugged the first swap and found something strange:

First run asm

Assembly Array Capture

What should have essentially been 2,2,3,4,5,6,7,8, seems like has shifted a place instead!

Why is this behaviour? How to fix it?

Jishan
  • 1,654
  • 4
  • 28
  • 62
  • 1
    Use `[esi + 1]` instead of inc/dec; it'll be easier to debug and more efficient. Also you don't need to xor-zero your regs; `mov eax,[esi]` already writes the full register with no dependency on the old value. – Peter Cordes Jun 23 '18 at 03:37
  • @PeterCordes would do from now on. Thanks for the heads up. – Jishan Jun 23 '18 at 03:39
  • 1
    You have an array of bytes but you're doing dword loads/stores. Use `movzx eax, byte [esi]` / ... / `mov [esi+1], al`. And then `add esi,2` to move to the next pair. – Peter Cordes Jun 23 '18 at 03:40
  • 1
    Or even more efficient: `rol word [esi], 8` / `add esi, 2` to swap two adjacent bytes in memory, when you don't have a use for having them in registers afterwards. The CPU only has to do one word load, one ALU rotate, and one word store for the rotate instruction, so you can run at one iteration per clock instead of one per 2 clocks. (If you stop using [the slow `loop` instruction](https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently) which bottlenecks you at 1 per ~6 clocks on Intel CPUs.) – Peter Cordes Jun 23 '18 at 03:42
  • @PeterCordes Would you explain `You have an array of bytes but you're doing dword loads/stores` a bit more? The resource I am following hasn't introduced `movzx` till now. – Jishan Jun 23 '18 at 03:45
  • 1
    You're loading / storing 4 bytes at a time, so you overlap multiple array elements. This is why you're shifting. – Peter Cordes Jun 23 '18 at 03:46
  • @PeterCordes I googled! Thanks a ton! Would you kindly write than as an aswer for me to mark? – Jishan Jun 23 '18 at 03:48
  • You're not the first person to have this problem on SO. It's almost a duplicate of [accessing array's element in assembly language (windows)](https://stackoverflow.com/q/2864011), but the answer there has flaws. Looking for other dup targets, but it's pretty hard to search for questions where that was the problem. (And by extension, it will unfortunately be very hard for anyone having the same problem as you to ever find this question without already knowing what bug they have.) – Peter Cordes Jun 23 '18 at 04:00
  • 1
    @PeterCordes, for anyone else's reference, I will post my solution as comment: `setting array as DWORD, I changed the loop to: L1: mov eax, [esi] mov ebx, [esi+4] mov [esi], ebx mov [esi+4], eax add esi, TYPE arr add esi, TYPE arr dec ecx` – Jishan Jun 23 '18 at 04:04
  • 1
    So you changed your array to `arr DWORD` instead of BYTE? For the increment, you should use `add esi, 2* TYPE arr`. (Also, you can `cmp esi, OFFSET arr + length` / `jb` instead of using a separate counter. (`+ length` is pseudo-code for some way of marking the end of the array. You could put your own `arr_end` label there so you have an end-pointer to compare against.) – Peter Cordes Jun 23 '18 at 04:08
  • @PeterCordes Great insight, thanks again! I did not use `cmp`, `/` because the book doesn't want me to for this exercise. – Jishan Jun 23 '18 at 11:13
  • 1
    The `/` is how I'm indicating a newline in a comment to separate the cmp from [the `jcc`](http://felixcloutier.com/x86/Jcc.html). I just meant a normal `do{p+=2; } while(p – Peter Cordes Jun 23 '18 at 12:59
  • damn I feel naive. thanks again! – Jishan Jun 23 '18 at 13:08
  • 1
    BTW, if your CPU has AVX2 you can swap all 8 dword elements in 1 instruction. [`vpshufd ymm0, [arr], _MM_SHUFFLE(2,3,0,1)`](http://felixcloutier.com/x86/PSHUFD.html) to load + shuffle, `vmovdqu [arr], ymm0` to store it back. The imm8 shuffle control is `10'11'00'01B` in binary. Or with SSE2 (baseilne for x86-64), two XMM vectors would do it. – Peter Cordes Jun 23 '18 at 18:06

0 Answers0