0

I'm working on an Assembly GAS/AT&T x86_64 assignment, which requires us to get some command line arguments and do some operations with them.

I've figured out their location in the stack, however I can't figure out how to compare the contents of the argument with another string, so as to detect whether a specific argument has been entered by the user. Here is a minimal example of what I'm trying to do. However, the execution never reaches the he subroutine.

.text

output: .asciz "%s"

arg: .ascii "-i"

.global main

main:

movq 8(%rsi), %rsi

movq arg, %rdi

cmpq %rsi, %rdi
je he

movq    $0, %rdi            
call    exit                

he:

movq $output, %rdi

movq $0, %rax
call printf

movq    $0, %rdi            
call    exit                

What am I doing wrong? Thank you in advance for the help!

chiken
  • 1
  • 1
  • Use the `strcmp` function to compare strings. Or write a loop that compares them character-by-character. Just comparing the pointers will only tell you if both point to the same string (not two different strings with the same contents). – fuz Oct 22 '22 at 19:13
  • Use a debugger to look at register contents, and notice that the 8 bytes that get loaded by `mov 8(%rsi), %rsi` are a pointer, `argv[1]`. Getting some ASCII bytes would take another dereference. You're basically doing `memcmp(&argv[1], "-i", 8)`. Oh also, your `"-i"` string is followed directly by the machine code for `main`, since you didn't put it at the end of a different section like `.rodata`. Perhaps you want `cmpw $('-'<<8) | 'i', (%rsi)` to compare 2 bytes (not including a terminating 0). Unfortunately GAS sucks for using multi-character literals as numeric literals, unlike NASM. – Peter Cordes Oct 22 '22 at 19:15
  • You could just compile a C program that does `memcmp(argv[1], "-i", 2)` and see how the compiler does it with optimization enabled. – Peter Cordes Oct 22 '22 at 19:23
  • (Or of course look at how it inlines `strcmp` if you *do* want to check for a complete string, instead of just starting with those 2 bytes.) https://godbolt.org/ is useful for looking at GCC asm output. Use `-O3` or at least `-O2`. – Peter Cordes Oct 22 '22 at 19:51
  • @PeterCordes thank you for the response... however, I still can't seem to make it work... I added another line `movq (%rsi), %rsi` to do another dereference, and then I declared 5 more bytes `.byte 0x00` right after the `arg: .ascii "-i"` declaration, but the execution still fails to reach the `he` subroutine... – chiken Oct 22 '22 at 20:41
  • Did you check the contents of `%rsi` with a debugger? What do you think is in the 5 bytes of memory following the `"-i"` in `argv[1][0..1]`? It's not all going to be zero. Also note that you used `.ascii`, not `.asciz`, so 5 bytes of padding is still one short of a qword. Again, a debugger should have shown you that. Use one, for example GDB. See the bottom of https://stackoverflow.com/tags/x86/info for GDB asm tips. – Peter Cordes Oct 22 '22 at 20:44

1 Answers1

2

You were comparing a pointer to 8 bytes of the arg string.

To compare strings, since you are using the C runtime, you could do it as you would in C: with strcmp.

.global main
.text

 strIArg: .asciz "-i"
 strHello: .asciz "Hello.\n"

main:
 sub $8, %rsp
 
 #At least two args?
 cmp $2, %edi
 jb 1f

 #2nd arg is equal to strIArg?
 mov 8(%rsi), %rdi           #argv[1] in RDI
 lea strIArg(%rip), %rsi     #"-i" in RSI (I'm making a PIE, hence the RIP-relative addressing)
 call strcmp
 
 test %eax, %eax            #strcmp returns 0 if the two strings are equal
 jnz 1f
 
 #OK, arg found
 
 lea strHello(%rip), %rdi
 call printf

1:
 xor %edi, %edi
 call exit

Alternatively, you can shave off the call to strcmp to improve performance if your argument is short enough and your program is very simple.

.global main

.text

strHello: .asciz "Hello.\n"

main:
 sub $8, %rsp
 
 #At least two args?
 cmp $2, %edi
 jb 1f

 #2nd arg is equal to strIArg?
 mov 8(%rsi), %rdi
 
 cmpb $0, (%rdi)
 je 1f          #Empty string?
 
 cmpw $0x692d, (%rdi)   #Starts with -i ?
 jne 1f
 
 cmpb $0, 2(%rdi)   #And then it ends?
 jne 1f
 
 #OK, arg found
 
 lea strHello(%rip), %rdi
 call printf

1:
 xor %edi, %edi
 call exit

But I would not recommend it but for the very simplest of cases, since GAS doesn't support string literals as immediates and you need to convert the string your self (taking care of the little-endianness of x86) reducing the readability of the code.

Finally, for more complex programs running on a POSIX system you may want to consider getopt_long and variants.
Below is an example of a program that greets the names passed on the command line and that takes two optional arguments to modify its behavior.
Note how getopt_long will take care of reordering the arguments, handling corner cases (e.g. when the user pass -un X as a short for -u -n X) and handling -- for us.

.global main

.data

 #Name to use for the greetings
 name: .quad defaultName
 
 #Greeting string to use
 greetings: .quad strHello

 #The long options accepted
 
 nameOpt: 
    .quad nameOptName   #name
    .quad 1         #has arg
    .quad 0         #ptr to flag to update with val (0 to make getopt_long return val instead)
    .quad 'n'       #val
 uppercaseOpt: 
    .quad uppercaseOptName
    .quad 0
    .quad 0
    .quad 'u'
 nullOpt:
    .quad 0 
    .quad 0
    .quad 0
    .quad 0     #Last option must be null

.text

 #Greetings strings
 strHello: .asciz "Hello %s from %s!\n"
 strHelloUpper: .asciz "HELLO %s FROM %s!\n"
 
 #Default name
 defaultName: .asciz "Margaret"


 
 nameOptName: .asciz "name"
 uppercaseOptName: .asciz "upper"
 
 #The short options accepted, note how we use "n" and "u" for both the long and short options
 #this is to reuse the logic but getopt_long allows to distinguish the two cases
 
 shortOpts: .asciz "n:u"

main:
 sub $8, %rsp
 
 #If you return from main, push r12 and r13 (and then pop them)
 
 #Move the args to non-volatile registers
 mov %rdi, %r12     #R12 = argc
 mov %rsi, %r13     #R13 = argv
 
parseArgs:
 mov %r12, %rdi
 mov %r13, %rsi
 lea shortOpts(%rip), %rdx
 lea nameOpt(%rip), %rcx
 xor %r8, %r8
 call getopt_long
 
 #Found --name/-n?
 cmp $'n', %al
 je foundName
 
 
 #Found --upper/-u?
 cmp $'u', %al
 je foundUpper
 
 #Everything else is an error or end of option args (-1)
 test %eax, %eax
 jns parseError

 #Args are parsed, optind is the index of the first non optional arg
 
 lea (%r13, %r12, 8), %r12  #R12 = one past the last argument       
 mov optind(%rip), %ecx     #RCX = current index
 lea (%r13, %rcx, 8), %r13  #R13 = pointer to pointer to current argument
 
 #Print the greetings
doGreetings:
 #Stop?
 cmp %r12, %r13
 jae end
 
 #Print the current greetings
 mov greetings(%rip), %rdi
 mov (%r13), %rsi
 mov name(%rip), %rdx
 call printf
 
 #Next arg
 add $8, %r13
jmp doGreetings
 
end:
 xor %edi, %edi
 call exit

foundName:
 #Here optarg is a pointer to the argument value
 #Copy the pointer to name
 
 mov optarg(%rip), %rdx
 mov %rdx, name(%rip)
jmp parseArgs

foundUpper:
 #This option has no argument, we just set the greetings to strHelloUpper
 
 lea strHelloUpper(%rip), %rcx
 mov %rcx, greetings(%rip)
jmp parseArgs

parseError:
 #Just return 1
 
 mov $1, %edi
 call exit

You can compile this program with GCC and then run it as:

./greet Alice Bob Eve
./geeet Alice --name Bob
./greet Alice --upper
./greet --name Eve --upper Alice
./greet -u Alice
./greet Alice -un Bob
Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • Your 2nd version, inlining the strcmp, actually inlined it as `memcmp(argv[1], "-i", 2)`, not checking for a *terminator*, just checking that that the first arg *started* with `-i`. Like the code in the question seemed to be attempting, since it used `.ascii` not `.asciz`, but that may have been an accident; clearly we shouldn't infer much from the details. They weren't comparing two pointers, they were comparing `argv[1]` against [8 bytes of ASCII + machine code](https://stackoverflow.com/questions/74166426/how-to-compare-command-line-arguments-x86-64/74168077#comment130945654_74166426). – Peter Cordes Oct 23 '22 at 10:13
  • One way to avoid manually converting to ASCII is to use GAS expressions. Like `cmpw $'-' | ('i' << 8), (%rdi)`. You still have to get the endianness correct, and it's still much less readable than NASM `cmp word [rdi], "-i"` so it doesn't really change your point that you don't want to do much of this by hand in GAS. – Peter Cordes Oct 23 '22 at 10:15
  • @PeterCordes Good point, I trivially forget the terminator! Let me fix it. The code also assumed the arg was at least two bytes long. Fixing the opening sentence too, it clearly compares pointer vs data. Thank you. – Margaret Bloom Oct 23 '22 at 10:41