0

I am building a compiler for a mini programming language. I happened to stumble on a head-scratching bug.

; & nasm -f elf64 debug.asm && gcc -m64 -no-pie -o debug debug.o && ./debug
    bits 64
    global main
    extern printf

    section .text
main:
    push rbp
    xor ebx, ebx
    add byte[block + ebx], 10 ; [block + ebx] = 10
    
    ; [block + ebx + 1] += [block + ebx] * 7
    mov rax, [block + ebx]
    imul rax, 7
    add byte[block + ebx + 1], al
    
    ; [block + ebx] = 17930??? why?!?
    
    mov rdi, fmt
    mov rsi, [block + ebx]
    call printf
    
    add ebx, 1
    mov rdi, fmt
    mov rsi, [block + ebx]
    call printf
    
    pop rbp
    mov rax, 60
    xor rdi, rdi
    syscall

    section .data
block times 30000 db 0 ; array of bytes
fmt: db "%lld", 10, 0

When I run the program above. I noticed that the value inside [block + ebx] abruptly changed from 10 to 17930 after addition. I don't know why it happened. I suspect it's an integer overflow. Any ideas? How can I fix it. Thanks in advance.

Ray Siplao
  • 199
  • 8
  • 2
    17930 = `(70 << 8) + 10`, which is exactly what you'd expect from adding AL to the 2nd byte of `block`, then reading 8 bytes into a register. IDK why you're doing qword loads but byte stores... It's totally up to you to use instructions that read the number of bytes you want to read, and zero-extend or sign-extend them into a wider register if that's what you want. – Peter Cordes Aug 19 '20 at 23:26
  • 2
    Also, prefer 64-bit addressing modes, like `[block + rbx]`. And if you're using printf, `call exit` instead of making a raw exit system call. That avoids problems when output is redirected to a file so stdout is full buffered. – Peter Cordes Aug 19 '20 at 23:33
  • @PeterCordes what's with left shift by 8? – Ray Siplao Aug 19 '20 at 23:36
  • 1
    A byte is 8 bits wide and x86 is little-endian, so storing to the 2nd byte (with `[block + ebx + 1]` with EBX=0) is equivalent to left-shifting. Single-step your code with a debugger and examine bytes of memory separately, vs. as dwords or qwords. – Peter Cordes Aug 19 '20 at 23:56
  • 2
    I guess the short answer to "why does the code do this" is "because that's what the instructions you wrote are designed to do". If you could explain what you *want* the code to do, then someone could help you figure out why it doesn't do that, and what code you should have generated instead. – Nate Eldredge Aug 20 '20 at 01:38

0 Answers0