0

p.s. the x86 assembly reprex below assembled/inspected using NASM 2.15.05 + x86_64 Linux + GDB

section .data
        Snippet db "KANG"
section .text
        global _start
_start:
        nop
; =============================================
        mov ebx,Snippet    ; at 0x804a000
        add <byte|word|dword> [ebx],32
; =============================================
        nop

"KANG" pointed to by Snippet is in memory from 0x804a000 to 0x804a003:

(gdb) print (char) *0x804a000
$1 = 75 'K'
(gdb) print (char) *0x804a001
$2 = 65 'A'
(gdb) print (char) *0x804a002
$3 = 78 'N'
(gdb) print (char) *0x804a003
$4 = 71 'G'
(gdb)

Decoding ASCII, I am assuming KANG is bit pattern 01001011 01000001 01001110 01000111 in memory from 0x804a000 to 0x804a003

When I leave out the size-specifier in the ADD instruction NASM doesn't assemble as expected:

ss.asm:9: error: operation size not specified

The add <specifier> [ebx],32 instruction is supposed to convert the data at [ebx] from uppercase to lowercase.

My confusion stems from the observation that irrespective of the size specifier used in the instruction, the result is always:

(gdb) print (char) *0x804a000
$1 = 107 'k'
(gdb) print (char) *0x804a001
$2 = 65 'A'
(gdb) print (char) *0x804a002
$3 = 78 'N'
(gdb) print (char) *0x804a003
$4 = 71 'G'
(gdb)

Considering the BYTE tells NASM that we're only writing a single byte to the memory address in EBX. Whereas the WORD and DWORD specifiers tells that we're writing a word and a double word, respectively.

While I was expecting byte to generate the above result. I was expecting the following operations (and results) for the other two size specifiers:

  • WORD: I was expecting the WORD specifier to manipulate 16 side by side bits (the 'K' and 'A' characters) starting at 0x804a000 with the operation 0100101101000001B + 0000000000100000B == 0100101101100001B resulting in "KaNG"

  • DWORD: similarly I was expecting the DWORD specifier to manipulate 32 side by side bits (the 'K','A','N', and 'G' characters) starting at 0x804a000 with the operation 01001011010000010100111001000111B + 00000000000000000000000000100000B == 01001011010000010100111001100111B resulting in "KANg"

Where is my understanding of the operation going wrong?

puwlah
  • 113
  • 6
  • 3
    That's because the upper 3 bytes of 32 are zero, and adding zero has no effect (if you don't have a carry from the low byte). – Jester Jun 21 '21 at 15:12
  • 2
    Interestingly, there's an *exact* duplicate of this, using the same `add byte [ebx], 32` vs. `add dword [ebx], 32` example in a case that doesn't carry-out. [Why do we need to disambiguate when adding an immediate value to a value at a memory address](https://stackoverflow.com/q/47445362) I guess it's from a book or tutorial? – Peter Cordes Jun 21 '21 at 15:32
  • While the question is answering something similar, it is not answering my question. I have made my question more specific. – puwlah Jun 21 '21 at 15:56
  • 3
    Oh, I see you also have an endianness confusion. `00000000000000000000000000100000B` is non-zero only in the lowest-address (first in printing order) byte of `dword [ebx]` because x86 is little-endian. If you want to modify the last byte, you want `add dword [ebx], 32 << 24` or `add byte [ebx+3], 32` – Peter Cordes Jun 21 '21 at 15:58
  • 2
    BTW, you can use AND to unconditionally upcase, or XOR to flip case, like `and dword [ebx], ~0x20202020` to clear the lower-case bit in all 4 characters. – Peter Cordes Jun 21 '21 at 16:02

0 Answers0