0

Hello Everyone!


I'm a newbie at NASM and I just started out recently. I currently have a program that reserves an array and is supposed to copy and display the contents of a string from the command line arguments into that array.

Now, I am not sure if I am copying the string correctly as every time I try to display this, I keep getting a segmentation error! This is my code for copying the array:


example:

%include "asm_io.inc"

section .bss
X: resb 50 ;;This is our array

~some code~

mov eax, dword [ebp+12]   ; eax holds address of 1st arg
add eax, 4                ; eax holds address of 2nd arg
mov ebx, dword [eax]      ; ebx holds 2nd arg, which is pointer to string
mov ecx, dword 0
;Where our 2nd argument is a string eg "abcdefg" i.e ebx = "abcdefg"
copyarray:
      mov al, [ebx]         ;; get a character
      inc ebx
      mov [X + ecx], al
      inc ecx
      cmp al, 0
      jz done
      jmp copyarray

My question is whether this is the correct method of copying the array and how can I display the contents of the array after? Thank you!

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • 1
    EDIT: Saw your SEGFAULT line and figured it would be Linux based. Can you show us your surrounding code? I assume you're doing a syscall for sys_write? – Simon Whitehead Dec 01 '15 at 23:00

1 Answers1

1

The loop looks ok, but clunky. If your program is crashing, use a debugger. See the for links and a quick intro to gdb for asm.

I think you're getting argv[1] loaded correctly. (Note that this is the first command-line arg, though. argv[0] is the command name.) https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames says ebp+12 is the usual spot for the 2nd arg to a 32bit functions that bother to set up stack frames.


Michael Petch commented on Simon's deleted answer that the asm_io library has print_int, print_string, print_char, and print_nl routines, among a few others. So presumably you a pointer to your buffer to one of those functions and call it a day. Or you could call sys_write(2) directly with an int 0x80 instruction, since you don't need to do any string formatting and you already have the length.


Instead of incrementing separately for two arrays, you could use the same index for both, with an indexed addressing mode for the load.

;; main (int argc ([esp+4]), char **argv ([esp+8]))
... some code you didn't show that I guess pushes some more stuff on the stack
mov eax, dword [ebp+12]   ; load argv
    ;; eax + 4 is &argv[1], the address of the 1st cmdline arg (not counting the command name)
mov esi, dword [eax + 4]  ; esi holds 2nd arg, which is pointer to string
xor ecx, ecx

copystring:
    mov   al, [esi + ecx]
    mov   [X + ecx], al
    inc   ecx

    test  al, al
    jnz copystring

I changed the comments to say "cmdline arg", to distinguish between those and "function arguments".

When it doesn't cost any extra instructions, use esi for source pointers, edi for dest pointers, for readability.

Check the ABI for which registers you can use without saving/restoring (eax, ecx, and edx at least. That might be all for 32bit x86.). Other registers have to be saved/restored if you want to use them. At least, if you're making functions that follow the usual ABI. In asm you can do what you like, as long as you don't tell a C compiler to call non-standard functions.

Also note the improvement in the end of the loop. A single jnz to loop is more efficient than jz break / jmp.

This should run at one cycle per byte on Intel, because test/jnz macro-fuse into one uop. The load is one uop, and the store micro-fuses into one uop. inc is also one uop. Intel CPUs since Core2 are 4-wide: 4 uops issued per clock.

Your original loop runs at half that speed. Since it's 6 uops, it takes 2 clock cycles to issue an iteration.


Another hacky way to do this would be to get the offset between X and ebx into another register, so one of the effective addresses could use a one-register addressing mode, even if the dest wasn't a static array. mov [X + ebx + ecx], al. (Where ecx = X - start_of_src_buf). But ideally you'd make the store the one that used a one-register addressing mode, unless the load was a memory operand to an ALU instruction that could micro-fuse it. Where the dest is a static buffer, this address-different hack isn't useful at all.


You can't use rep string instructions (like rep movsb) to implement strcpy for implicit-length strings (C null-terminated, rather than with a separately-stored length). Well you could, but only scanning the source twice: once for find the length, again to memcpy.

To go faster than one byte clock, you'd have to use vector instructions to test for the null byte at any of 16 positions in parallel. Google up an optimized strcpy implementation for example. Probably using pcmpeqb against a vector register of all-zeros.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Great answer Peter. Thanks for pointing out the problem with my answer - silly of me not to reference some documents before posting. – Simon Whitehead Dec 03 '15 at 04:14
  • 1
    @SimonWhitehead: Not everyone has the x86 insn ref manual PDF open at all times for a quick alt-tab. :D When I read your answer, my first thought was "hmm, I didn't *think* that worked, but maybe it does" :P – Peter Cordes Dec 03 '15 at 04:19