How to swap 2 bytes in x86 assembly correctly?

Question

I have a buffer that contains: 'bac\n' and I am trying to swap the letters 'b' and 'a'

I checked the debugger and it printed 4 bytes at the address ebp pointed to: 'bac\n'

ebp - the address of the buffer
eax - the offset (currently 0), so that ebp + eax points to the 'b' in the buffer
ebx - contains 'b'
edi - contains 'a'

The problem is when I run the instruction that is supposed to overwrite the 'b' in the buffer with 'a':

mov [ebp + eax], edi

... then when I print the buffer and it now contains: 'ac\n'. Where did the 'b' go? If I run the next instruction which is supposed to overwrite the 'a' in the buffer with a 'b', completing the swap:

mov  [ebp + eax + 1], ebx

... then the buffer now contains: 'abac' instead of 'abc\n'

Can anyone explain what is happening here?

You need to move only one byte, and not the whole register (4 bytes). — Alex F, Oct 03 '13 at 06:21

phuclv · Answer 1 · 2018-08-26T04:24:19.977

2

I think you've copied the whole 32-bit register before, in the part that you don't show

; copy 4 bytes from ebp + eax to ebp + eax + 3
mov ebx, [ebp + eax]     ; ebx = 'bac\n'
mov edi, [ebp + eax + 1] ; edi = 'ac\n<garbage>', which is ebp + eax + 1 to ebp + eax + 4

Hence after moving the characters back to memory

mov [ebp + eax], edi     ; the string now becomes 'ac\n<garbage>'
mov [ebp + eax + 1], ebx ;                        'abac\n' (5 bytes)

which is what you see. You must copy only one byte, not one double word. But DI doesn't have a corresponding byte register name, thus you should rearrange you register usage to a low byte register such as CL/BL, for example

mov  bl, [ebp + eax]
xchg [ebp + eax + 1], bl ; simple, but not efficient
mov  [ebp + eax], bl

In case you have no free register left you'll have to use bitwise manipulation

If you use x86_64 then the low part of DI can be assessed as DIL, but it'll be one byte longer

edited Aug 26 '18 at 04:24

answered Oct 03 '13 at 07:32

phuclv

37,963
15
156
475

Don't recommend xchg with memory unless you put a warning comment that it's *many* times slower than separate load+store, because of the implied `lock` prefix. See [this answer](https://stackoverflow.com/a/47021804/224132) for some exchange-with-memory possibilities with/without a scratch register. (Also, you're mixing up DI with SI, I think) – Peter Cordes Aug 26 '18 at 04:22
@PeterCordes I intended to put that but forgot. Thanks about SI – phuclv Aug 26 '18 at 04:25
1

If you want simple, use `rol word [mem], 8`. Also, the OP describes already having both values in registers. Simply change your register allocation so you have both in regs with the low byte accessible. I added an answer. – Peter Cordes Aug 26 '18 at 05:01

score 1 · Answer 2 · answered Oct 03 '13 at 10:06

OP's key problem is that EDI can only be moved to memory as a 32 bit value. He needs a register that can be moved as an 8 bit value, as Luu suggested; this would be AH, AL, BH, ... . AL is easiest to use and it is a "part" of EAX.

Following Luu's advice would be easy if OP would revise his code so that EDI contained the offset, and EAX contained the character. Then the instruction needed to store a byte is

mov byte ptr [ebp+edi],al

The "byte ptr" tells the assembler you are expecting to move 8 rather than 32 bits; it is technically unnecessary because the use of AL clearly indicates that only 8 bits are to be used but it is helpful to the reader.

Peter Cordes · Answer 3 · 2018-08-26T05:06:40.617

You can swap two bytes in memory with a 16-bit rotate-by8:

rol  word ptr [ebp], 8     ;  byte [ebp] becomes byte [ebp+1], and vice verse

But if you already have the bytes in registers (e.g. because you loaded them so you could compare them), then it might be better to store them from registers.

Since you need to store only the low byte of the register, you need to use byte stores of al or bl not dword stores of edi! Change your register allocation so you have your bytes in one of AL, BL, CL, or DL, not in the low byte of EDI. Only x86-64 makes the low byte of EDI accessible (as DIL). Use EDI as your index. (Then name EDI comes from Destination Index). Or instead of base+index, use pointer increments so EDI is pointing at the current byte or pair of bytes you might want to swap.

Thus:

     movzx    eax, byte ptr [edi + ebp]      ; load the 1st byte
     movzx    edx, byte ptr [edi + ebp + 1]  ; load the 2nd byte
     cmp      al, dl
     jae   noswap

     mov      [edi+ebp],   dl        ; opposite of how you loaded them
     mov      [edi+ebp+1], al
noswap:
     inc      edi

     ... loop logic

movzx avoids a false dependency on the old value of EAX on CPUs that don't rename AL separately from the whole EAX. If you'd done mov al, [edi + ebp], some CPUs would have the old value of EAX as another input dependency for that instruction.

Note that if you're actually implementing Bubble Sort (eww, yuck), you only need to do one load per iteration. You always have one of the two values to compare in a register already. You can set up for the first iteration with a load outside the loop.

If you were doing 16-bit loads and then comparing only the low byte (e.g. as part of a sort), you're perfectly set up to swap the low 2 bytes of ebx and store that back:

    movzx   eax,  word ptr [edi+ebp]
    cmp     ah, al       ; compare the low 2 bytes of EAX with each other
    jae    noswap

    rol     ax, 8        ; swap AL with AH.  This is more efficient than xchg al,ah or two MOV stores.
    mov     [edi+ebp], ax
noswap:

This is good on its own, but the 16-bit store which partially overlaps with the next 16-bit load is kinda bad. (store forwarding stall). Loading just the new byte into AH (keeping the old byte in AL) isn't great either; that will stall for a cycle to merge when reading AH on modern Intel CPUs, and create a dependency chain.

And why did I use [edi+ebp] instead of [ebp+edi]? It saves a byte to make EBP the index register, because [EBP + EDI*1] with no displacement isn't encodeable. NASM and YASM don't swap for you, because base=EBP implies the SS segment. But assuming a flat memory model, it doesn't matter, so we can manually make that optimization.

A BubbleSort of 8-bit elements, in 19 bytes of 16-bit x86 machine code: https://codegolf.stackexchange.com/questions/77836/sort-an-integer-list/149038#149038, and a JumpDown sort of 32-bit elements.

Assembly - bubble sort for sorting string (sorting the chars in a string)

Assembly bubble sort swap

Changing between 8-bit vs. 32-bit elements is just a matter of changing register names and changing the increment from 1 to 4, once you understand how x86 partial registers work.

How to swap 2 bytes in x86 assembly correctly?

3 Answers3

Related