2

This is probably a bad idea, but I want to practice my assembly alongside inline assembly. After I figured out how to read command line arguments and create files using them here, I transitioned the code to inline assembly in C++. It seems to all have transferred fine (no compilation warnings or segfaults), but the program does absolutely nothing. Code and objdump below. Any idea on why it's not executing the statements?

Edit: The program should be creating a file using the filename given in argv1.

Edit 2: Intel(R) Core(TM) i7-4710HQ 64bit CPU @ 2.50GHz

Compilation is done with:

g++ -o args args.cpp -g -nostartfiles

The code:

extern "C" void _start();

void _start(){

    asm ( "pop %rcx;" /* Contains argc */
          "cmp $2, %rcx;" /* If argc = 2 (argv[0 & argv[1] exist) */
          "jne exit;" /* If it's not 2, exit */
          "add $8, %rsp;" /* Move stack pointer to argv[1] */
          "pop %rsi;" /* Pop off stack */
          "mov %rsi, %rdi;" /* Move argv[1] to rdi */

          "mov $85, %rax;" /* #define __NR_creat 85 */
          "mov $0x2E8, %rsi;" /* move 744 to rsi */
          "syscall;"
          "jmp exit;"
      );

    asm(  "exit:\n\t"
          "mov $60, %rax;"
          "mov $1, %rdi;"
          "syscall"
      );

}

Objdump:

0000000000400292 <_start>:
  400292:       55                      push   %rbp
  400293:       48 89 e5                mov    %rsp,%rbp
  400296:       59                      pop    %rcx
  400297:       48 83 f9 02             cmp    $0x2,%rcx
  40029b:       75 1a                   jne    4002b7 <exit>
  40029d:       48 83 c4 08             add    $0x8,%rsp
  4002a1:       5e                      pop    %rsi
  4002a2:       48 89 f7                mov    %rsi,%rdi
  4002a5:       48 c7 c0 55 00 00 00    mov    $0x55,%rax
  4002ac:       48 c7 c6 e8 02 00 00    mov    $0x2e8,%rsi
  4002b3:       0f 05                   syscall

  4002b5:       eb 00                   jmp    4002b7 <exit>
Michael Petch
  • 46,082
  • 8
  • 107
  • 198
Chirality
  • 745
  • 8
  • 22
  • I highly recommend writing the code in C++ and telling the compiler to print the assembly language for the code. This will give you a baseline to compare to. – Thomas Matthews Apr 01 '16 at 20:23
  • BTW, assembly language is processor dependent. The assembly language for an ARM processor is different than assembly language for an Intel processor. So when you talk about assembly language, please state the processor. The corollary is that assembly language code will only work on the targeted processor and is not portable. – Thomas Matthews Apr 01 '16 at 20:25
  • Please add a description of the purpose or functionality of the assembly language to your question. I'm having a difficult time understanding why a register is compared to 2, when there are no parameters passed to the function. – Thomas Matthews Apr 01 '16 at 20:28
  • @ThomasMatthews Updated with comments and processor at the top. When the program starts, command line arguments are pushed to the stack. See figure 3.9 - http://www.x86-64.org/documentation/abi.pdf – Chirality Apr 01 '16 at 21:06
  • 1
    If you want to write whole functions in ASM, write them in a `.S` file. You can still run `gcc` on it to assemble and link. GNU C inline asm is not designed for writing whole functions in asm. (Although on ARM, `__attribute__((naked))` exists to stop the compiler emitting any function prologue or `ret` instruction.) Also see [the x86 tag wiki](http://stackoverflow.com/tags/x86/info), and also [this collection of links on how to use GNU inline asm correctly to avoid forcing the compiler to emit instructions it doesn't need](http://stackoverflow.com/a/34522750/224132). – Peter Cordes Apr 02 '16 at 21:14

2 Answers2

3

This is bad. You might have noticed that the function prologue push %rbp mov %rsp,%rbp is emitted by the compiler for function _start:

400292:       55                      push   %rbp
400293:       48 89 e5                mov    %rsp,%rbp

If you are going to do this then consider at least compiling with -fomit-frame-pointer. With the function prologue pushing RBP, when you pop RCX you aren't placing the number of command line arguments into RCX, you are putting the value of RBP (which is now at top of stack) into RCX. Of course this cascades to your other stack operations working on the wrong values.

Rather than omitting the stack frame as my first suggestion, you could have coded the _start function directly like this:

asm ( ".global _start;" /* Make start symbol globally visible */
      "_start:;"
      "pop %rcx;" /* Contains argc */
      "cmp $2, %rcx;" /* If argc = 2 (argv[0 & argv[1] exist) */
      "jne exit;" /* If it's not 2, exit */
      "add $8, %rsp;" /* Move stack pointer to argv[1] */
      "pop %rdi;" /* Pop off stack */

      "mov $85, %rax;" /* #define __NR_creat 85 */
      "mov $0x2E8, %rsi;" /* move 744 to rsi */
      "syscall;"

      "exit:;"
      "mov $60, %rax;" /* sys_exit */
      "mov $2, %rdi;"
      "syscall"
  );

Since the normal process of declaring a C++ function has been bypassed we don't need to worry about the compiler adding prologue and epilogue code.


The file mode bits you use for sys_creat are incorrect. You have:

"mov $0x2E8, %rsi;" /* move 744 to rsi */

0x2E8 = 744 decimal. I believe your intention was to put 744 octal into %RSI. 744 octal is 0x1e4. To make it more readable you can use octal values in GAS by prepending the value with a 0. This would have been what you were looking for:

"mov $0744, %rsi;" /* File mode octal 744 (rwxr--r--) */

Rather than:

  "pop %rsi;" /* Pop off stack */
  "mov %rsi, %rdi;" /* Move argv[1] to rdi */

You could have popped directly into %rdi:

  "pop %rdi;" /* Pop off stack */

You could have also kept the parameters on the stack in place and directly accessed them this way:

asm ( ".global _start;" /* Make start symbol globally visible */
      "_start:;"
      "cmp $2, (%rsp);" /* If argc = 2 (argv[0 & argv[1] exist) */
      "jne exit;" /* If it's not 2, exit */

      "mov 16(%rsp), %rdi;" /* Get pointer to argv[1] */
      "mov $85, %eax;" /* #define __NR_creat 85 */
      "mov $0744, %esi;" /* File mode octal 744 (rwxr--r--) */
      "syscall;"

      "exit:;"
      "mov $60, %eax;" /* sys_exit */
      "mov $1, %edi;"
      "syscall"
  );

In this last code snippet I've also changed to using 32-bit registers in some instances. You can take advantage of the fact that in x86-64 code, putting a value into a 32-bit register automatically zero extends the value into the high 32-bits of the 64-bit register. This can save a couple of bytes on the instruction encoding.


Accessing Command Line Parameters via main w/64-bit Code

If you compile using the C/C++ runtime, the runtime will supply a label _start that does program startup, modifies the command line parameters passed by the OS to suit the 64-bit System V ABI. Parameter passing is discussed in section 3.2.3. In particular the first two parameters to main in 64-bit code are passed via RDI and RSI. RDI will contain the value argc and RSI will contain a pointer to an array of char * pointers. Since these parameters are not passed via the stack we don't need to concern ourselves with any function prologue and epilogue code.

int main(int argc, char *argv[])
{
    asm ( "cmp $2, %rdi;"     /* If argc = 2 (argv[0 & argv[1] exist) */
          "jne exit;"         /* If it's not 2, exit */
                              /* _RSI_ (second arg to main) is a pointer
                                 to an array of character pointers */
          "mov 8(%rsi), %rdi;"/* Get pointer to second char * pointer in argv[] */
          "mov $85, %eax;"    /* #define __NR_creat 85 */
          "mov $0744, %esi;"  /* File mode octal 744 (rwxr--r--) */
          "syscall;"

          "exit:;"
          "mov $60, %eax;" /* sys_exit */
          "mov $1, %edi;"
          "syscall"
    );
}

You should be able to compile this with:

 g++ -o testargs testargs.c -g

Special note: If you intend to eventually use inline assembly along with C/C++ code you are going to have to learn about GCC extended assembler templates, constraints, clobbering, etc. That is beyond the scope of this question. Learning assembler is much more difficult if you use inline assembly as compared with creating separate assembly code objects and call them from C/C++. It is very easy to use GCC's extended inline assembly improperly. Code may seem to work at first, but subtle bugs can creep in as the program gets more complex.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • Thanks for the explanation. 0x2e8 works just fine. 85 in hex is 0x55 as seen in the objdump. Is there any way to grab command line args if you do declare a main? – Chirality Apr 02 '16 at 19:14
  • @user10984587 Although 0x2e8 may seem to work, do an `ls -l filename` on it. I doubt the flags you were hoping to use are set. Assuming you declare `main` and remove the `-nostartfiles` then the _C/C++_ runtime will supply a `_start` label, it will parse the command arguments and then it will make a call to `main` using [64-bit SYS V Calling Convention](http://www.x86-64.org/documentation/abi.pdf). With 64-bit code first parameter to a function is in _RDI_(argc), second in _RSI_(RSI would be a pointer to an array of pointers to characters).This is different than how `_start` gets the arguments – Michael Petch Apr 02 '16 at 19:27
  • @user10984587 Parameter passing is covered in 3.2.3 of the document linked to in my last comment. – Michael Petch Apr 02 '16 at 19:29
  • Remember that in C/C++ that `main` with command line parameters is defined as `int main(int argc, char *argv[])` or `int main(int argc, char **argv)` (both do the same thing) – Michael Petch Apr 02 '16 at 19:34
  • @user10984587 : I've added something to the bottom of my answer to address your question about using `main` – Michael Petch Apr 02 '16 at 20:07
  • Wow. I had it all except I was using 4(%rsi), (%rdi) and 12(%rsi), (%rdi) and wondering why it wasn't working, but I've got it now. Thanksf or the explanation! – Chirality Apr 02 '16 at 20:13
  • @user10984587 : Ah yes, have to remember as well that pointers are 64-bit (8 bytes). – Michael Petch Apr 02 '16 at 20:18
1

Yes, it is a bad idea to to use GCC inline assembly to learn assembly.

Except as specified by constraints the values of registers at the start of inline assembly are undefined. The first statement in a function is not an exception. For documentation of inline assembly see the GCC manual.

In this particular case the compiler has added the function prologue:

0000000000400292 <_start>:
  400292:       55                      push   %rbp
  400293:       48 89 e5                mov    %rsp,%rbp

So now the top of the stack is no longer argc, but the value of RBP on program start.

Timothy Baldwin
  • 3,551
  • 1
  • 14
  • 23