5

See the statements of string1 and string2 as well as their len1 and len2. The code is Assembly for x86_64 using GNU Assembler, passing parameters to invoke Linux x86_64 system calls. When I mov len1, %rdx it oddly generates a nonsense value (8390045993705406470) in execution. However, when I mov len1, %rdi it works fine. The former mov as parameter for sys_write and the latter one for sys_exit.

The code: (foo.s)

.section .data
    string1: .string "test\n"
    len1: .long .-string1

    string2: .string "another\n"
    len2: .long .-string2
.section .text
    .globl _start

_start:
# Linux syscall references
# http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/

    # write syscall
    mov $1, %rax # sys_write
    mov $1, %rdi # unsigned int fd: stdout
    lea string1, %rsi # const char *buf
    mov len1, %rdx # size_t count: length of string1
    syscall

    # exit syscall
    mov $60, %rax # sys_exit
    mov len1, %rdi # int error_code
    syscall

Compile with:

as foo.s -o foo.o
ld foo.o -o foo

Executing:

strace ./foo
execve("./foo", ["./foo"], 0x7ffde801edb0 /* 70 vars */) = 0
write(1, "test\n\0\6\0\0\0another\n\0\t\0\0\0\0\0\0\0\0\0\0\0\0"..., 8390045993705406470) = -1 EFAULT (Bad address)
exit(6)                                 = ?
+++ exited with 6 +++

If I remove string2 and len2 it works.

In fact, the complete code is about to create the file /tmp/foo.txt and write some text inside it. In sum: write a message in stdout; open file; write something in it; close it; exit the process.

Kernel version:

uname -srmo
Linux 4.15.0-20-generic x86_64 GNU/Linux

Objdump output:

$ objdump -d foo

foo:     file format elf64-x86-64


Disassembly of section .text:

00000000004000b0 <_start>:
  4000b0:   48 c7 c0 01 00 00 00    mov    $0x1,%rax
  4000b7:   48 c7 c7 01 00 00 00    mov    $0x1,%rdi
  4000be:   48 8d 34 25 e1 00 60    lea    0x6000e1,%rsi
  4000c5:   00 
  4000c6:   48 8b 14 25 e7 00 60    mov    0x6000e7,%rdx
  4000cd:   00 
  4000ce:   0f 05                   syscall 
  4000d0:   48 c7 c0 3c 00 00 00    mov    $0x3c,%rax
  4000d7:   48 8b 3c 25 e7 00 60    mov    0x6000e7,%rdi
  4000de:   00 
  4000df:   0f 05                   syscall 

Any thoughts on how to overcome this?

Lourenco
  • 2,772
  • 2
  • 15
  • 21
  • 1
    `.long` isn't doing what you think it does. See [gas manual](https://sourceware.org/binutils/docs/as/Long.html). You wan't `.quad`. Even better, don't store it in memory at all. You can use `.equ` instead. PS: had you examined `8390045993705406470` in hex (`746F6E6100000006`), you would have realized the top 32 bits were garbage. In fact that garbage is the beginning of your `another` string. – Jester May 02 '18 at 18:35
  • @Jester, you are completely right! Thank you. Please, how would be the same statement using `.equ`? – Lourenco May 02 '18 at 18:49
  • 3
    `.equ len1, . - string1` and then `mov $len1, %rdx` – Jester May 02 '18 at 19:16
  • @Jester Hooray! It works! (I was using `mov len1..` instead of `$len` as you just pointed out) – Lourenco May 02 '18 at 19:24
  • 1
    `mov len1` was correct when you're putting the length in static data with `.long`. What would have worked is `mov len1, %edx` to do a zero-extending 32-bit load (or better, `mov len1(%rip), %edx`). But yes, it's better to use the length as an assemble-time constant with the value instead of the address hard-coded into the `mov` instruction. `mov $len1, %edx` is the most efficient way. [The advantages of using 32bit registers/instructions in x86-64](//stackoverflow.com/q/38303333) – Peter Cordes May 03 '18 at 12:37

0 Answers0