assembly language help finding argv[1][0]

Question

I'm trying to get the first element of what is stored in argv[1] in x86 assembly language. I have popped the stack twice to eax initially because I want argc so I can count the number of argc. Then popped argv to ebx. Im thinking of putting [ebx] into bl. From here I am lost. I have little or no experience in assembly, I am just looking to understand it.

main:
;
    mov ecx, 0 ;count output characters
    pop eax ;reject this 32 bits
    pop eax ;get argc
    ;
    pop ebx ; get argv
    ;mov bl, [ebx]
    ;
    add al, 30H ;convert integer to ascii
    mov edi, cline ;put char in output buffer
    mov byte [edi],al
    ;inc edi
    ;mov [edi], bl
    inc ecx ;increment output char count
    inc edi ;increment pointer to o/p buffer
    mov al, 0aH ;LF to end line
    mov byte[edi],al ;put it at end of output line
    inc ecx ;increment output char count
    push ecx ;save char count on stack
    mov edx,len ;length of string to write
    mov ecx,msg ;addr of string
    mov ebx,1 ;file descriptor 1 = stdout
    mov eax,4 ;"write" system call
    int 0x80 ;call the kernel
;
    pop edx ;restore char count into edx for system call
    mov ecx,cline ;address of string
    mov ebx,1 ;file descriptor 1 = stdout

Don't forget to use code blocks please. I fixed it for you. – ChiefTwoPencils Nov 02 '13 at 22:58 — ChiefTwoPencils, Nov 02 '13 at 22:58

score 7 · Answer 1 · edited Aug 20 '23 at 15:22

Take a look here: NASM - Linux Getting command line parameters

Here is how it works:

argc = [esp]
argv = [esp+4 + 4 * ARG_NUMBER]

Where ARG_NUMBER is the index into argv. This is also equivalent to [esp + 4*one_based_index], but that's harder to think about when you also have to remember that Unix argv[0] is the program name.

./test hello there
[esp] = 3
[esp+4 + 4 * 1] = ./test (program path and name)
[esp+4 + 4 * 2] = hello
[esp+4 + 4 * 3] = there

I will use printf from the C Library to make it clearer:

extern printf, exit

section .data
fmtint  db  "%d", 10, 0
fmtstr  db  "%s", 10, 0

section .text
global main
main:
    
    push    dword[esp]
    push    fmtint      
    call    printf                      ; print argc
    add     esp, 4 * 2

    mov     ebx, 1  
PrintArgV:
    push    dword [esp + 4 * ebx]
    push    fmtstr
    call    printf                      ; print each param in argv
    add     esp, 4 * 2
    
    inc     ebx
    cmp     ebx, dword [esp]
    jng     PrintArgV
    
    call    exit

There is no error checking here to keep it simple. You could check to see if the number of args exceeds what you expect or whatever.

enter image description here

@Ed Cashin, if the OP is learning INTEL syntax, why confuse them with AT&T?

I'd suggest writing it as `[esp + 4 + 4 * ARG_INDEX]`, so you don't have to mess around with 1-based numbering. Then everything matches up with C where`argv[1]` is the first arg after the program name. — Peter Cordes, May 30 '17 at 08:19

score 0 · Answer 2 · answered Nov 03 '13 at 01:57

I have three recommendations:

Check out http://www.nasm.us/doc/nasmdoc9.html if you haven't already,
Minimize the code that attempts to solve your immediate problem, and
Examine the assembly generated by a C compiler if possible when you're stuck.

For getting argv, I can simply return the ASCII code for the first character in argv[1] from my program to avoid the system call. The system call is a different problem than getting argv, so avoiding it focuses attention on the problem at hand.

Then I can compile a minimal C program and examine the generated assembly. Reading AT&T syntax assembly isn't that bad if you remember that when you're going to AT&T, which is in New Jersey, the destination is on the right side of the U.S. ;)

tmp$ cat main.c
int main(int argc, char *argv[])
{
        if (argc > 1)
                return argv[1][0];
        return 0;
}
tmp$ gcc -Wall -save-temps main.c

The program just returns the ASCII code for the first character in argv[1]. The 't' is 116.

tmp$ ./a.out test
tmp$ echo $?
116
tmp$

Examining the generated assembly, I see that it doesn't use pops but rather just loads registers based on the position of stack parameters relative to the base pointer, ebp. I find I like this style of using mov with the base pointer.

I don't use pop the way you're trying to do, so someone else might comment on how to do it using pop.

I've annotated the assembly a bit with comments about what I think is going on. Corrections are welcome.

tmp$ cat main.s
        .file   "main.c"
        .text
.globl main
        .type   main,@function
main:
        pushl   %ebp         ; push the callers base pointer onto the stack
        movl    %esp, %ebp   ; save esp into the base pointer
        subl    $8, %esp     ; make some room on the stack for main ...
        andl    $-16, %esp   ; but pad to an aligned stack pointer
        movl    $0, %eax
        subl    %eax, %esp   ; dunno why gcc subtracts 0 from stack pointer
        cmpl    $1, 8(%ebp)  ; compare 1 and argc, which is 8 past the base pointer
        jle     .L2          ; jump to .L2 if argc <= 1
        movl    12(%ebp), %eax   ; fetch argv into eax
        addl    $4, %eax         ; skip the first 32 bits at that address
        movl    (%eax), %eax     ; fetch address from the resulting address
        movsbl  (%eax),%eax      ; load byte from that address into eax
        movl    %eax, -4(%ebp)   ; put that byte onto the stack (see note 1.)
        jmp     .L1
.L2:
        movl    $0, -4(%ebp)
.L1:
        movl    -4(%ebp), %eax   ; load return value from stack (see note 1.)
        leave
        ret
.Lfe1:
        .size   main,.Lfe1-main
        .ident  "GCC: (GNU) 3.2.2"
tmp$

I don't have nasm handy on a 32-bit machine, and x86_64 calling conventions are not the same as the ones you're dealing with, so I didn't translate this assembly into nasm syntax.

The compiler will do some stuff that makes you scratch your head and wonder, "Is that smart or dumb?" In this case, I think probably I would just use eax itself instead of the stack for holding the return value, but sometimes googling is educational. I learned why gcc sometimes likes to zero a register with "xor reg, reg", for example, by googling.

@Gunner asks why I used AT&T syntax when the OP is clearly using Intel syntax. The reason is that I believe that achieving bilingual proficiency is less difficult than trying to avoid AT&T syntax altogether. The ability to check the output left by "-save-temps" makes up for any temporary confusion. To ease the discomfort further I provided a mnemonic. Thanks for asking. — Ed Cashin, Nov 05 '13 at 13:54
`gcc -Og -m32 -masm=intel -S` makes Intel syntax assembly so if you're trying to learn both syntaxes, you can see the same code both ways. See [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116). If you want to compare against NASM, GAS Intel syntax is much closer, although it's MASM-like so NASM `mov reg, symbol` is GAS `mov reg, OFFSET symbol`. — Peter Cordes, Aug 20 '23 at 15:21

assembly language help finding argv[1][0]

2 Answers2