0

I am new to Assembly.
And I have this code

section .data                   ; we define (global) initialized variables in .data section
    an: dd 0                    ; an is a local variable of size double-word, we use it to count the string characters

section .text                   ; we write code in .text section
    global do_Str               ; 'global' directive causes the function do_Str(...) to appear in global scope

section .text                   ; we write code in .text section
    global do_Str               ; 'global' directive causes the function do_Str(...) to appear in global scope

do_Str:                         ; do_Str function definition - functions are defined as labels
    push ebp                    ; save Base Pointer (bp) original value
    mov ebp, esp                ; use Base Pointer to access stack contents (do_Str(...) activation frame)
    pushad                      ; push all signficant registers onto stack (backup registers values)
    mov ecx, dword [ebp+8]      ; get function argument on stack
                                ; now ecx register points to the input string
    yourCode:                   ; use label to build a loop for treating the input string characters
        cmp byte [ecx], ' '
        JE  updateAndCount
        inc ecx                 ; increment ecx value; now ecx points to the next character of the string
        cmp byte [ecx], 0       ; check if the next character (character = byte) is zero (i.e. null string termination)
        jnz yourCode            ; if not, keep looping until meet null termination character

    updateAndCount:
        mov byte [ecx], '_'
        inc dword[an]
        ret

    popad                       ; restore all previously used registers
    mov eax,[an]                ; return an (returned values are in eax)
    mov esp, ebp                ; free function activation frame
    pop ebp                     ; restore Base Pointer previous value (to returnt to the activation frame of main(...))
    ret                         ; returns from do_Str(...) function

but when running it(I have c code calling it), I get this error:

Segmentation fault (core dumped)

I know it has something to do with the return from updateAndCount, but I'm not sure how to fix it.

zx485
  • 28,498
  • 28
  • 50
  • 59
Maya Saias
  • 37
  • 9
  • 1
    `je` isn't `call`, it doesn't push a return address for `ret` to pop back into EIP. Also, `jnz yourCode` will fall through into that mov/inc/ret block making `popad`/.../ret unreachable. Use a debugger to single-step through your code. You *could* do like [Return from jump to main](//stackoverflow.com/q/56081079) to make a conditional call, or you could use a `jmp` at the bottom of `updateAndCount` back to a label inside your loop. I think there are some other Q&As on SO about mixing up `je` or other JCC with `call`, but just remember that `ret` isn't magic, it's how we wrtie `pop eip` – Peter Cordes Mar 16 '20 at 20:15
  • The snippet starting with the `updateAndCount` label mustn't end with `ret`, it should jump (or fall through) to where you want your loop to continue. That is not the only control flow error you did however. After the `jnz yourCode` you probably want an unconditional jump to the function epilogue (which starts with `popad`). Or move the epilogue around to follow directly behind the `jnz yourCode` and then let it fall through if the `jnz` does not jump. – ecm Mar 16 '20 at 20:18
  • Moreover you did not initialise the `an` variable in the function so it will act like a static variable in C, that is if your function is called repeatedly `an` will keep incrementing and not be reset when your function is called subsequently. This may be intentional but you did not specify which behaviour is intended. Besides, `yourCode` is an exceptionally undescriptive label. I'd recommend replacing it by `.loop` -- the leading dot makes it a [local label for NASM](https://www.nasm.us/xdoc/2.14.02/html/nasmdoc3.html#section-3.9). The name is to describe the intention of that jump target. – ecm Mar 16 '20 at 20:26
  • I am not sure I understand(this is my first assembly project for school and most of it was given for us). if I do not use ret in updateAndCount then it doesnt go back to yourCode label and just exit the program. how can I make it for back and keep looping then? – Maya Saias Mar 16 '20 at 20:27
  • Use single step in your debugger. When you single step, verify ever expectation that you have: observe which of the following changed: memory, registers. Some of these you're expecting changes and others expecting they remain the same -- verify both after each single step. For example, observe the stack by checking if the stack pointer changed, and if stack was allocated, verify what value was pushed on. If you're expecting a return address pushed on, then validate that it happens at instruction of your call site -- if it doesn't happen as you expect, then adjust! – Erik Eidt Mar 16 '20 at 20:40
  • Even if I know what happens on the stack, I do not understand how to return where I stopped in the yourCode label. right now with Ret it exit after entering updateAndCount label and I do not know how to make it keep looping – Maya Saias Mar 16 '20 at 20:44

1 Answers1

1

Expanding from my comments with an example.

The snippet starting with the updateAndCount label mustn't end with ret, it should jump (or fall through) to where you want your loop to continue. That is not the only control flow error you did however. After the jnz yourCode you probably want an unconditional jump to the function epilogue (which starts with popad). Or move the epilogue around to follow directly behind the jnz yourCode and then let it fall through if the jnz does not jump.

Moreover you did not initialise the an variable in the function so it will act like a static variable in C, that is if your function is called repeatedly an will keep incrementing and not be reset when your function is called subsequently. This may be intentional but you did not specify which behaviour is intended. Besides, yourCode is an exceptionally undescriptive label. I'd recommend replacing it by .loop -- the leading dot makes it a local label for NASM. The name is to describe the intention of that jump target.

Here's an example correcting the control flow, initialising the an variable, and changing the labels to descriptive and local labels. I will not optimise the program beyond that, nor make it threadsafe (which it isn't due to the global variable).

section .data                   ; we define (global) initialized variables in .data section
    an: dd 0                    ; an is a local variable of size double-word, we use it to count the string characters

section .text                   ; we write code in .text section
    global do_Str               ; 'global' directive causes the function do_Str(...) to appear in global scope

do_Str:                         ; do_Str function definition - functions are defined as labels
    push ebp                    ; save Base Pointer (bp) original value
    mov ebp, esp                ; use Base Pointer to access stack contents (do_Str(...) activation frame)
    pushad                      ; push all signficant registers onto stack (backup registers values)
    mov dword [an], 0

;;; continues in the next code block
;;; all the code blocks in this answer combine to one function with one loop

This zero-initialises the variable to zero. (I'd use and with zero personally, which is slightly shorter than mov with a zero immediate number. But for clarity we'll use mov.)

    mov ecx, dword [ebp+8]      ; get function argument on stack
                                ; now ecx register points to the input string
    jmp .first

This unconditional jump fixes another logic error. If the very first byte in the string is a NUL byte we should most likely honour it, not continue to process whatever follows after it.

.loop:                          ; use label to build a loop for treating the input string characters
    cmp byte [ecx], ' '
    jne .do_not_updateAndCount

I inverted the condition code of the jump. Now it jumps if we don't want to run the .updateAndCount part.

.updateAndCount:
    mov byte [ecx], '_'
    inc dword [an]

I kept your label (now local with the leading dot) even though it is not referenced anywhere. It can serve as a comment for what this block is supposed to do, however.

Crucially, the control flow falls through after the end of the .updateAndCount block here. Equivalently, you could add an unconditional jmp .next at the end to jump (back) into the loop. If you moved around the .updateAndCount block (eg to behind the epilogue's ret) then this jmp .next would be required.

.do_not_updateAndCount:

.next:
    inc ecx                     ; increment ecx value; now ecx points to the next character of the string
.first:
    cmp byte [ecx], 0           ; check if the next character (character = byte) is zero (i.e. null string termination)
    jnz .loop                   ; if not, keep looping until meet null termination character

    popad                       ; restore all previously used registers
    mov eax, [an]               ; return an (returned values are in eax)
    mov esp, ebp                ; free function activation frame
    pop ebp                     ; restore Base Pointer previous value (to returnt to the activation frame of main(...))
    ret                         ; returns from do_Str(...) function
ecm
  • 2,583
  • 4
  • 21
  • 29
  • Thank you so much for the effort! But there is something I still do not understand, In you code after jumping to .do_not_updateAndCount how does it go back to .loop? beacsue I tried something similar and it does not go back – Maya Saias Mar 16 '20 at 20:53
  • `.do_not_updateAndCount` falls through the `.next` label, then does `inc ecx`, then falls through `.first`, then does the `cmp` and `jnz .loop` there. The control flow is the same regardless of whether `.updateAndCount` was executed or not. Your `cmp` and `jnz` instructions were already correct, you just needed the `jmp .next` (or fallthrough as in my example) to continue going to the `cmp` and, if not yet at the string's end, then back to `.loop` again. If you show us your attempt we may be able to help in more specific ways. You could add that to your question as a second example. – ecm Mar 16 '20 at 21:00
  • 1
    @MayaSaias: Note the `jnz .loop`. This is a normal `do{}while()` loop structure with the loop conditional branch at the bottom. [Why are loops always compiled into "do...while" style (tail jump)?](https://stackoverflow.com/q/47783926) The loop body is broken up across a couple code blocks with text in between in this answer. – Peter Cordes Mar 16 '20 at 21:02
  • So the aim of this code is to count how many spaces there are in a string and update the spaces to '_' – Maya Saias Mar 16 '20 at 21:03
  • @ecm: I'd suggest keeping it simple and doing `mov` to init static `an`. If you wanted to *optimize* this code, you wouldn't use static storage at all, and not `pushad`, so you could much more easily just keep a counter in EAX without having to save/restore it across `popad`. The function only needs 2 registers, so even if you want to not clobber ECX for some reason it's easier to just save/restore that. More importantly, IMO `and` with zero is not a good thing to teach beginners; usually performance matters more than code-size on modern x86. – Peter Cordes Mar 16 '20 at 21:08
  • @Peter Cordes: Fair, edited it to use `mov` instead. But don't let me come upon you using `xor` zeroing in examples for beginners =P (Okay to be fair that is also better for performance.) – ecm Mar 16 '20 at 21:16