0

I actually wanted to write a program in assembly(linux) , to accept filename from the command line and I was successful by retrieving the values from the stack using successive pop opcodes when I used "ld" command to build but I was unsuccessful when i used "gcc" command . I need to use gcc because I will be using various C std library function in this program.

Actually the file was creating , but its always got a "Invalid encoding " label and appeared like <? G ? in the directory.I wanted to know:

  1. Do we follow a different procedure when built using gcc tools
  2. What was the possible reason for an invalid encoding file being created (out of curiosity).

Here is a sample code that works with ld but not with gcc.

section .data
    filename: db 'testing',0
section .text
    ;extern printf    ;to be uncommented when using gcc
    ;extern scanf     ;           -do-
    global _start   ; replace with main when using gcc

_start:     ; replace with main:
    pop ebx     ; argc (argument count)
    pop ebx     ; argv[0] (argument 0, the program name)
    pop ebx     ; The first real arg, a filename

    mov eax,8       
    ; issue: ebx is not holding the filename popped from cli using gcc 
    ;mov     ebx,filename   ; filename as a constant works with gcc but cli?
    mov ecx,00644Q  ; Read/write permissions in octal (rw_rw_rw_)
    int 80h     ; Call the kernel
                ; Now we have a file descriptor in eax

    test    eax,eax     ; Lets make sure the file descriptor is valid
    js  terminate   ; If the file descriptor has the sign flag              
    call    fileWrite

terminate:
    mov ebx,eax     ; If there was an error, save the errno in ebx
    mov eax,1       ; Put the exit syscall number in eax
    int 80h     ; control over to kernel

fileWrite:  ; simply closing the file for time being
    mov ebx,eax        ; edited
    mov eax,6       ; sys_close (ebx already contains file descriptor)
    int 80h
    call terminate

Solution and Caveat: There is a difference in the stack when using libc or bare-bone assembly.

  1. When using libc the , the first pop returns the return address followed by argc and argv values respectively.

  2. In bare-bone assembly , the first pop return the argc ,and every pop hence gives the successive argv values unlike a arguments pointer returned when using libc.

Source: Reading filename from argv via x86 assembly

Community
  • 1
  • 1
touchStone
  • 317
  • 2
  • 16
  • 2
    The `gcc` command when linking is just a frontend for the `ld` command, it still calls `ld` to do the actual linking, passing along the flags needed for the standard library. If you want to use the standard C library, just add the flag to link with it: `-lc`. – Some programmer dude Mar 11 '15 at 10:28
  • Just a point, your last comment says `; sys_close (ebx already contains file descriptor)`. Does it? – Weather Vane Mar 11 '15 at 10:34
  • @WeatherVane you are right ...i missed the mov ebx,eax line at the beginning of fileWrite..but ebx hold the filename when i use ld without any stdlib support. – touchStone Mar 11 '15 at 10:38
  • @JoachimPileborg adding -lc flag worked , but i wonder why indirectly using ld through gcc doesn't work. any idea? – touchStone Mar 11 '15 at 11:04
  • @JoachimPileborg and touchStone: Linking using `gcc foo.o` actually uses `ld crt.o foo.o -lc`. You can use `gcc -nostartfiles` to get libc but not the CRT startup code which defines `_start`. See http://stackoverflow.com/questions/36861903/assembling-32-bit-binaries-on-a-64-bit-system-gnu-toolchain/36901649#36901649 – Peter Cordes Jul 02 '16 at 00:13
  • Possible duplicate of [Reading filename from argv via x86 assembly](http://stackoverflow.com/questions/7854706/reading-filename-from-argv-via-x86-assembly) – Peter Cordes Jul 02 '16 at 00:15

1 Answers1

0

This blog post explains how the stack looks like when the entry point of a program is being called: http://eli.thegreenplace.net/2012/08/13/how-statically-linked-programs-run-on-linux

In a nutshell, you have these elements on the stack:

  1. argc
  2. argv[0] - program/executable name
  3. argv[1] ... argv[argc-1] - Program arguments
  4. argv[argc] - Always NULL
  5. envp[0] ... envp[N] - The current environment
  6. NULL to terminate the envp array

Those pointers are either 32 bit or 64 bit, depending on your kernel. x86 = 32 bit, x64 = 64 bit. So make sure you fetch the correct sizes from the stack. On x64, argc takes 8 bytes.

If you want to avoid this hassle, link against libc and provide a main entry point instead of _start. libc contains _start which will process the command line argument into arrays and then call main with three elements on the stack:

  1. int argc
  2. char** argv
  3. char** envp

The startup code of libc will also initialize the stdio framework; without that, calls to printf() will fail because stdout will be a NULL pointer.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • will "mov ebx, dword [ebx+4]" following the seond pop and discarding the 3rd , work using gcc ?? I got segment fault when i tried ...but can you help me with the same? – touchStone Mar 11 '15 at 11:36
  • It should :-/ Again, I did this last time in 1996, so my knowledge is a bit rusty :-) Can you try to run the code in a debugger? – Aaron Digulla Mar 11 '15 at 12:32
  • I have found a blog post which shows the stack layout. Turns out that you were right, the kernel actually pushes the `argv` array onto the stack element by element. See my edits. – Aaron Digulla Mar 11 '15 at 12:38
  • ironically libc(via. gcc) couldn''t fetch the filename from the 3rd pop and resulted in a "invalid label file" hence the question. Seems like i am unable to apply theory practically. But using ld (without c lib suport) or ld with -lc (for c lib support) works fine as pointed by @Joachim. We still miss something here... – touchStone Mar 11 '15 at 13:37
  • Write a very simple C program which examines the command line arguments. Then convert that to assembler with `gcc -S -o out.asm ...` to see what code GCC creates to process `argv[]` – Aaron Digulla Mar 11 '15 at 13:45
  • 1. mov eax, DWORD PTR [ebp+12], 2. mov eax, DWORD PTR [eax], 3. mov DWORD PTR [esp], eax, 4. call printf – touchStone Mar 11 '15 at 14:43
  • I have tried to use the above code in the assembly but no success yet..and i am still trying.... but it appear like they move the values in stack pointed by esp followed by stdlib function calls. – touchStone Mar 11 '15 at 14:45
  • Sure; libc uses OS independent calls like `printf()`. You're using kernel calls via `int 80h`. Just concentrate on the way the code accesses `argv`. – Aaron Digulla Mar 11 '15 at 15:23
  • Finally , got both my answers here: http://stackoverflow.com/questions/7854706/reading-filename-from-argv-via-x86-assembly ...and i have to appreciate your help in this. – touchStone Mar 11 '15 at 15:33