-1

I wrote a x86 (IA-32) assembly program that is supposed to read a string from the standard input but cannot understand why it is resulting in a SEGFAULT.

I assembled this program with the GNU assembler using the following flags:

$ gcc (flags used) (file_name)

Below is the code of the program:

.text

.globl _start

MAX_CHAR=30

_start:

    ## Start message ##
    movl $4, %eax
    movl $1, %ebx
    movl $msg, %ecx
    movl $len, %edx
    int $0x80


    ## READ ##
    movl $3, %eax       #sys_read (number 3)
    movl $0, %ebx       #stdin (number 0)
    movl %esp, %ecx     #starting point
    movl $MAX_CHAR, %edx    #max input
    int $0x80       #call


    ## Need the cycle to count input length ##  
    movl $1, %ecx       #counter
end_input:
    xor %ebx, %ebx
    mov (%esp), %ebx
    add $1, %esp        #get next char to compare 
    add $1, %ecx        #counter+=1
    cmp $0xa, %ebx      #compare with "\n" 
    jne end_input       #if not, continue 


    ## WRITE ##
    sub %ecx, %esp      #start from the first input char
    movl $4, %eax       #sys_write (number 4)
    movl $1, %ebx       #stdout (number 1)
    movl %ecx, %edx     #start pointer
    movl %esp, %ecx     #length
    int $0x80       #call
     

    ## EXIT ##
    movl $1, %eax
    int $0x80   

.data

msg: .ascii "Insert an input:\n"
len =.-msg

What is causing the SEGFAULT?

Any help would be welcomed.

greybrunix
  • 83
  • 10
  • Please note for future that questions about assembly language should always be tagged [tag:assembly] as well as with the architecture you are coding for (here [tag:x86]). The [tag:gnu-assembler] is for things specific to the GNU assembler, which can target many different architectures. – Nate Eldredge Dec 04 '22 at 19:01
  • Also please be specific about the problem you are facing. Just saying the code "is wrong" is very uninformative. How did you run it? With what input? What did it do? What were you expecting it to do instead? – Nate Eldredge Dec 04 '22 at 19:02
  • Sorry, it's x86 Linux computer 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux – student_canada Dec 04 '22 at 19:05
  • `movl %esp, %ecx`: This will overwrite the current contents of the stack, which is bad; the stack may not even be large enough for your input data, which would be even worse. You need to decrement the stack pointer to make room for your buffer. Something like `subl $MAX_CHAR, %esp`, except that the stack should stay aligned to 4 bytes at minimum, so `subl $32, %esp`. Then add 32 back after you are done with the buffer. – Nate Eldredge Dec 04 '22 at 19:05
  • `add $1, %esp`: No, don't use the stack pointer as your index pointer. Leave `%esp` alone, and use some other register to step through the buffer. – Nate Eldredge Dec 04 '22 at 19:06
  • 1
    `mov (%esp), %ebx` loads a 32-bit word (4 bytes) whereas you really only want to compare one byte. So use an 8-bit register, e.g. `mov (%reg), %bl` and then `cmp $0xa, %bl`. Or just combine them together and skip the register load altogether: `cmpb $0xa, (%reg)`. – Nate Eldredge Dec 04 '22 at 19:09
  • Sorry, I was a little confused about future changes, could you modify and mention them? I think it gets easier, as I said I'm learning... - @NateEldredge – student_canada Dec 04 '22 at 19:11
  • Segfault on which instruction? How did you assemble + link this? If you built it into a 64-bit executable with `gcc -static -nostdlib foo.s` (without `-m32`), `mov (%esp), %ebx` will segfault because truncating RSP to 32-bit doesn't produce a valid pointer. – Peter Cordes Dec 04 '22 at 19:20

1 Answers1

3

Bugs that I see:

  • Stack management. You can't assume anything about the data already on the stack on program entry, nor how much space is available. And you mustn't write below the current address in %esp; for instance, signal handlers can overwrite it unexpectedly at any time. So you need to subtract from %esp to allocate space for your buffer, then add back when done.

  • Moreover, %esp should remain aligned to 4 bytes at all times. This is not strictly an architectural requirement, but breaking this rule will cause inefficient execution and a lot of confusion. Thus, to create space for a 30-byte buffer, round up and subtract 32 from %esp.

    When you want to call functions written in C, there are additional alignment requirements, see gcc x86-32 stack alignment and calling printf.

  • For both of the above reasons, don't use %esp as a pointer variable in your loop: leave it alone and choose some other register.

  • Operand size. x86-32 instructions can generally operate on either 8, 16 or 32 bits. The l suffix and/or use of a 32-bit register (eax, ebx, and so on) signals a 32-bit instruction. So mov (%esp), %ebx loads 4 bytes from memory, and cmp $0xa, %ebx compares them to the 32-bit value 0x0000000a. Thus the comparison will be wrong unless the next three bytes in memory just happened to all be zeros. To get 8-bit operation, use 8-bit registers (al, bl, ah, bh, etc), but be aware that they overlap the corresponding 16-bit and 32-bit registers; so don't try to use %ebx and %bl for different things at the same time. Try movb (%reg), %bl (where as mentioned above, %reg shouldn't be %esp but rather whatever register you use instead) and cmpb $0xa, %bl. The b suffix is optional as the size is inferred from the 8-bit bl register, but as you're using suffixes in most of the rest of your cod, might as well be consistent.)

  • You are writing 32-bit code here, so be sure to build your program in 32-bit mode. For instance, if using gcc, you need the -m32 flag. In the long run, you might prefer to learn 64-bit x86 assembly instead; 32 bit x86 code is pretty much obsolete.

  • Actually, counting the length of the input by searching for newline (0xa) isn't really appropriate in the first place. If the input doesn't contain a newline at all, which is possible if the line was more than 30 bytes long, then your loop will run off the end of the buffer. To find out how many characters were read, you should instead use the return value from read, which is left in %eax after the read system call returns. (If it is zero, end-of-file was reached; if it's negative, there was an error.)

    Moreover, if you're reading from the terminal in its default mode, you will normally just get at most one line at a time anyway, so if there is a \n it would correspond with the end of the input returned by read. (But this doesn't apply if standard input is redirected from a file.)

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • In practice in `_start`, you know that `argv[]` and `envp[]` will be above the initial ESP, so in any normal system that alone is enough for 30 bytes of stack space with only a few env vars. In practice the pointed-to strings for args and environment are also in stack space above that, so for toy hacks it's basically fine to overwrite that space. As long as you understand what you're doing and that this would not be ok in a function that needs to return. So I agree that using ESP this way is probably a mistake for a beginner, especially without comments, but it's not strictly broken. – Peter Cordes Dec 04 '22 at 20:13
  • It is a bug to increment ESP past the characters and then subtract again, though. In practice without having installed any signal handlers, nothing will asynchronously clobber that space below ESP. But the ABI doesn't guarantee it. And a debugger might, if you ran `print foo()` so it invented a call to a `foo` symbol in your process, using its current ESP. (Normally debuggers are non-intrusive, but evaluating function calls as part of expressions triggers running of code in the target process with its stack.) – Peter Cordes Dec 04 '22 at 20:16