1

I am trying to learn nasm. I want to make a program that prints "Hello, world." n times (in this case 10). I am trying to save the loop register value in a constant so that it is not changed when the body of the loop is executed. When I try to do this I receive a segmentation fault error. I am not sure why this is happening.

My code:

SECTION .DATA
    print_str:          db 'Hello, world.', 10
    print_str_len:      equ $-print_str

    limit:              equ 10
    step:               dw 1

SECTION .TEXT
    GLOBAL _start 

_start:
    mov eax, 4              ; 'write' system call = 4
    mov ebx, 1              ; file descriptor 1 = STDOUT
    mov ecx, print_str      ; string to write
    mov edx, print_str_len  ; length of string to write
    int 80h                 ; call the kernel

    mov eax, [step]         ; moves the step value to eax
    inc eax                 ; Increment
    mov [step], eax         ; moves the eax value to step
    cmp eax, limit          ; Compare sil to the limit
    jle _start              ; Loop while less or equal

exit:
    mov eax, 1              ; 'exit' system call
    mov ebx, 0              ; exit with error code 0
    int 80h                 ; call the kernel

The result:

Hello, world.
Segmentation fault (core dumped)

The cmd:

nasm -f elf64 file.asm -o file.o
ld file.o -o file
./file
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Which instruction is crashing? – melpomene Jan 10 '19 at 18:05
  • 4
    You define step as a word (16-bits) but you move 32-bits of data to it with `mov [step], eax` . As well the section names `.DATA` and `.TEXT` should be lower case, not upper case – Michael Petch Jan 10 '19 at 18:10
  • I can't say with complete confidence but if I change ```mov [step], eax``` (comment it out or something) the code will run without crashing. It just iterates through the loop once. – HoldenDinerman Jan 10 '19 at 18:10
  • Can you move 16-bits instead of a 32-bit mov? – HoldenDinerman Jan 10 '19 at 18:11
  • 2
    You should use `dd` not `dw`. – Jester Jan 10 '19 at 18:11
  • I changed the step definition to dd but received the same result. – HoldenDinerman Jan 10 '19 at 18:12
  • 1
    Did you fix the section names? – melpomene Jan 10 '19 at 18:13
  • What is incorrect about them? – HoldenDinerman Jan 10 '19 at 18:14
  • 3
    See my comment above. It mentions the names need to be lower case. `.text` and `.data` are special, but `.DATA` and `.TEXT` are not. – Michael Petch Jan 10 '19 at 18:14
  • 3
    Also see [What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?](https://stackoverflow.com/q/46087730/547981) for later reference. – Jester Jan 10 '19 at 18:15
  • O i see. I have one last (slightly unrelated) question. What does the ", 10" after the string variable do in masm? I saw that on an example so I am not sure what its purpose is. That fixed the problem. Thank you for your time and help. – HoldenDinerman Jan 10 '19 at 18:17
  • 10 is a line feed character to go to the next line. In Unix/Linux a line feed is the same as `\n` in _C_ – Michael Petch Jan 10 '19 at 18:18
  • Ok. Everything works now. Thanks again. – HoldenDinerman Jan 10 '19 at 18:20
  • in x86 assembly you can also do `inc dword [step]` and `cmp dword [step],limit`, (as limit in the second case is just numeric immediate constant, and `cmp r/m32, imm32` exists). (but you must specify size of operand then, as assembler can't tell from `inc [step]` if you want to increment one, two or four (or eight in 64b mode) bytes, such line is ambiguous. – Ped7g Jan 10 '19 at 18:29

1 Answers1

4

section .DATA is the direct cause of the crash. Lower-case section .data is special, and linked as a read-write (private) mapping of the executable. Section names are case-sensitive.

Upper-case .DATA is not special for nasm or the linker, and it ends up as part of the text segment mapped read+exec without write permission.

Upper-case .TEXT is also weird: by default objdump -drwC -Mintel only disassembles the .text section (to avoid disassembling data as if it were code), so it shows empty output for your executable.

On newer systems, the default for a section name NASM doesn't recognize doesn't include exec permission, so code in .TEXT will segfault. Same as Assembly section .code and .text behave differently


After starting the program under GDB (gdb ./foo, starti), I looked at the process's memory map from another shell.

$ cat /proc/11343/maps
00400000-00401000 r-xp 00000000 00:31 110651257                          /tmp/foo
7ffff7ffa000-7ffff7ffd000 r--p 00000000 00:00 0                          [vvar]
7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0                          [vdso]
7ffffffde000-7ffffffff000 rwxp 00000000 00:00 0                          [stack]

As you can see, other than the special VDSO mappings and the stack, there's only the one file-backed mapping, and it has read+exec permission only.

Single-stepping inside GDB, the mov eax,DWORD PTR ds:0x400086 load succeeds, but the mov DWORD PTR ds:0x400086,eax store faults. (See the bottom of the x86 tag wiki for GDB asm tips.)

From readelf -a foo, we can see the ELF program headers that tell the OS's program loader how to map it into memory:

$ readelf -a foo   # broken version
  ...
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000bf 0x00000000000000bf  R      0x200000

 Section to Segment mapping:
  Segment Sections...
   00     .DATA .TEXT 

Notice how both .DATA and .TEXT are in the same segment. This is what you'd want for section .rodata (a standard section name where you should put read-only constant data like your string), but it won't work for mutable global variables.

After fixing your asm to use section .data and .text, readelf shows us:

$ readelf -a foo    # fixed version
  ...
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000e7 0x00000000000000e7  R E    0x200000
  LOAD           0x00000000000000e8 0x00000000006000e8 0x00000000006000e8
                 0x0000000000000010 0x0000000000000010  RW     0x200000

 Section to Segment mapping:
  Segment Sections...
   00     .text 
   01     .data 

Notice how segment 00 is R + E without W, and the .text section is in there. Segment 01 is RW (read + write) without exec, and the .data section is there.

The LOAD tag means they're mapped into the process's virtual address space. Some section (like debug info) aren't, and are just metadata for other tools. But NASM flags unknown section names as progbits, i.e. loaded, which is why it was able to link and have the load not segfault.


After fixing it to use section .data, your program runs without segfaulting.

The loop runs for one iteration, because the 2 bytes following step: dw 1 are not zero. After the dword load, RAX = 0x2c0001 on my system. (cmp between 0x002c0002 and 0xa makes the LE condition false because it's not less or equal.)

dw means "data word" or "define word". Use dd for a data dword.


BTW, there's no need to keep your loop counter in memory. You're not using RDI, RSI, RBP, or R8..R15 for anything so you could just keep it in a register. Like mov edi, limit before the loop, and dec edi / jnz at the bottom.

But actually you should use the 64-bit syscall ABI if you want to build 64-bit code, not the 32-bit int 0x80 ABI. What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?. Or build 32-bit executables if you're following a guide or tutorial written for that.

Anyway, in that case you'd be able to use ebx as your loop counter, because the syscall ABI uses different args for registers.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847