1

I am trying to change the variable value in x86_64 asm

Here is my approach

section .data
    text db "Hello, World!",10
   
 
section .text
    global _start
 
_start:
    mov rax, 1
    mov rdi, 1
    mov rsi, text
    mov rdx, 14
    syscall

    mov rax , "He"


    mov  [text], rax
    syscall
   
    

    mov rax, 1
    mov rdi, 1
    mov rsi, text
    mov rdx, 14
    syscall

    mov rax, 60
    mov rdi, 0
    syscall

But that outputs

Hello, World!
Heorld!

I have tried to use : mov word [text], "He" but that doesnt work neither

liveno
  • 45
  • 3
  • 1
    Try `strace .\a.out` to see what arguments of the 2nd `syscall` are. After `mov rax, 'He'` rax=0x0000000000006548`, no kernel function with such number is implemented. – vitsoft Oct 14 '22 at 15:46

2 Answers2

4

Regardless of odd syscall after memory modification, reason of output is the following. Initially bytes at address text are:

 0  1  2  3  4  5   6  7  8  9  a  b  c  d

48 65 6c 6c 6f 2c  20 57 6f 72 6c 64 21 0a
 H  e  l  l  o  , ' ' W  o  r  l  d  ! \n

After

mov rax , "He"

rax contains 0x6548 in two lower bytes, six other bytes are zeroed. As x86_64 is little endian, after

mov  [text], rax

Memory is:

 0  1  2  3  4  5  6  7  8  9  a  b  c  d

48 65 00 00 00 00 00 00 6f 72 6c 64 21 0a
 H  e \0 \0 \0 \0 \0 \0  o  r  l  d  ! \n

Zero bytes are just not printed on terminal.

dimich
  • 1,305
  • 1
  • 5
  • 7
  • 1
    `mov [text], ax` would only overwrite 2 bytes. (This is 64-bit code, so `default rel` should be used somewhere, although that's separate from the question being asked.) – Peter Cordes Oct 16 '22 at 03:24
2

The thing you call a variable is a label that basically holds the address of the value in memory. When you want to change the value you need to use brackets [] and dereference the address that points to that location. Then you can change the values one by one. For example, lets define a one-byte variable:

v: db 0x00

To change the value you can do

mov byte[v], 0x02

As you can see we specified the size with byte

If we had the following variable:

abc: dw 0x0000

the variable abc would only hold the address of the first byte of the data but the data itself is a word (2 bytes). That is why to change the variable's value we need to do:

mov word[abc], 0xDEAD

which would be equivalent to

mov byte[abc], 0xAD
mov byte[abc+1], 0xDE

Note that the least first byte of the 2-byte value is in the earlier memory address, this is called little-endian order.

A string is essentially a bunch of "bytes" next to each other (it doesn't use little endian). To change a string value one by one you can do:

text: db "Hello World", 0

mov byte [text], 'A' ; Aello World
mov byte [text+1], 'B' ; ABllo World
mov byte [text+2], 'C' ; ABClo World
; and etc 

Also finally we can take a look at your code:

text db "Hello, World!",10
   
mov rax , "He"
mov [text], rax
syscall

This is not valid (as pointed out by @vitsoft) because you are putting "He" inside of rax before calling syscall which uses rax to determine what it's gonna do.

As a matter of fact this line of code

mov word [text], "He"

is perfectly valid. I don't know why you couldn't get that to work. "He" is essentially resolved to 0x6548 and you do a normal mov as a word. As I mentioned before because of the little-endian order for words, 0x48 ('H') will be placed in the first byte of text which is already "H" and similarly 0x65 ('e') will be placed in the second byte of text which is already "e".

Edit:

Lets say you don't know the length of a string which you want to copy to another string/location. In that case you should loop over that string and do the changes one by one. I will leave a sample code here which you would need to fix and adapt:

start:
    xor ecx, ecx ; initialize some variable to keep count

.loop:
    mov al, byte [other + rcx] ; get the nth character of other.
    cmp al, 0x00 ; if we reached the end of the string
    je endLoop ; end the function
    mov [text + rcx], al ; write the nth character of other to nth position of text
    inc ecx ; increase counter
    jmp .loop ; loop

endLoop:
    ret

text: db "Hello World", 0
other: db "ABC", 0
  • This makes a lot of sense , thank you , i am writing a toy compiler that generates x86_64 assembly and this really helped me – liveno Oct 14 '22 at 15:49
  • @liveno happy to help, I am sure more people will make additions as an answer and also fix any mistakes I made – Özgür Güzeldereli Oct 14 '22 at 15:52
  • @liveno added an edit – Özgür Güzeldereli Oct 14 '22 at 16:15
  • Güzelderel i know this might sound like a stupid question , but is there a way i could change the value of the label without doing [abc + 1] , becouse i need to do this for values that are know only at the run-time and i dont know their lenght – liveno Oct 14 '22 at 16:26
  • @liveno added an edit that you might use to accomplish that, note that this code isn't concise or optimized. You can write your code based on this if you wish and adapt to how you want to use it. – Özgür Güzeldereli Oct 14 '22 at 16:50
  • 1
    @liveno: Normally you'd just `mov edx, other` or `lea rdx, [rel other]` *ahead* of a loop, and `inc rdx` inside the loop to advance the pointer to the next byte. You'd use an `[rdx]` addressing mode so the assembler doesn't need to use an extra byte of machine code for an address-size prefix to override to 32-bit addressing. If the code in this answer works (in a [Linux non-PIE executable](https://stackoverflow.com/q/43367427)), you could also use `[other + rcx]`, using the label address as a 32-bit sign-extended absolute address in the machine code. – Peter Cordes Oct 16 '22 at 03:29
  • @PeterCordes is this better? – Özgür Güzeldereli Oct 16 '22 at 08:41
  • Yeah, now it's a *much* simpler loop, easy to see that it's like `strcpy` except not copying the terminating `0` byte. Normally [you'd rotate the loop](https://stackoverflow.com/questions/47783926/why-are-loops-always-do-while) so a `test al,al` / `jnz .loop` could be at the bottom (e.g. peeling the load from the first iteration, or entering with a `jmp` to a load+test at the bottom), but this is the simplistic way that's maybe clearer for beginners. You'd also normally load bytes with `movzx eax, byte [other + rcx]` to avoid false dependencies, but for correctness there's no problem. – Peter Cordes Oct 16 '22 at 09:55
  • And remember, the question is using 64-bit code; you should, too, using `[symbol + rcx]` in addressing modes. (You don't need to support strings longer than 4GiB, so `ecx` everywhere else is fine.) – Peter Cordes Oct 16 '22 at 09:56
  • @PeterCordes also fixed ```rcx``` – Özgür Güzeldereli Oct 16 '22 at 10:09
  • 1
    `xor ecx,ecx` is the most efficient way to zero RCX. ([What is the best way to set a register to zero in x86 assembly: xor, mov or and?](https://stackoverflow.com/q/33666617)). Sometimes beginner code just uses the 64-bit register name everywhere, without thinking about which uses don't need 64-bit operand size, but this was a question where that was actually the problem for the store of RAX instead of AX. So probably a good place to show beginners which register width to use. 32-bit operand-size and 64-bit address size are the defaults in machine code, so prefer that. – Peter Cordes Oct 16 '22 at 10:14
  • @PeterCordes I tried to run the loop and it causes segmentation fault, any idea why? – liveno Oct 20 '22 at 12:28
  • @liveno: Are you talking about the code block at the bottom of this question? The destination, `text: db ...` is in the read-only `.text` section, since there's no `section .data` directive before it. A store to a read-only page will segfault. – Peter Cordes Oct 20 '22 at 13:03