2

I'm a CS student learning IA-32 assembly. For a project, we've been given the executable file for a program. We can use objdump and other tools to inspect the binary, but are not allowed to see the original source code. The program takes in an input string and compares it against another mystery string. If the two strings are not the same, the program sets off an alarm, and I flunk the assignment. It would be a fun assignment... if the TA would bother to answer my questions... Grr...

So if you don't mind giving me some pointers, I'd like to ask the forum if I'm on the right track. When I run objdump -d CODE on the CODE executable, I can drill down and see this in the main() function:

08048a44 <main>:
...
 8048af6:   e8 d0 08 00 00          call   80493cb <get_string>
 8048afb:   89 04 24                mov    %eax,(%esp)
 8048afe:   e8 ad 00 00 00          call   8048bb0 <test_string>

I'm reasonably certain that the get_string() gets a string from the user - its probably a wrapper function for fscanf() or something - and then the pointer to that string is saved into register %eax. The next line moves the pointer to %esp, then calls test_string(). Here's that code:

08048bb0 <test_string>:
 8048bb0:   83 ec 1c                sub    $0x1c,%esp
 8048bb3:   c7 44 24 04 6c a4 04    movl   $0x804a46c,0x4(%esp)
 8048bba:   08 
 8048bbb:   8b 44 24 20             mov    0x20(%esp),%eax
 8048bbf:   89 04 24                mov    %eax,(%esp)
 8048bc2:   e8 bd 04 00 00          call   8049084 <cmp_strings>
 8048bc7:   85 c0                   test   %eax,%eax
 8048bc9:   74 05                   je     8048bd0 <test_string+0x20>
 8048bcb:   e8 bc 07 00 00          call   804938c <alarm>
 8048bd0:   83 c4 1c                add    $0x1c,%esp
 8048bd3:   c3                      ret    

Here's what I think is happening...

08048bb0 <test_string>:
 8048bb0:   sub    $0x1c,%esp            // Adjusts %esp for new function
 8048bb3:   movl   $0x804a46c,0x4(%esp)  // test_string is stored at $0x804a46c; move that pointer into %esp
 8048bba:                                // ???
 8048bbb:   mov    0x20(%esp),%eax       // Moves test_string ptr to %eax
 8048bbf:   mov    %eax,(%esp)           // Moves test_string ptr to %esp - not sure why...?
 8048bc2:   call   8049084 <cmp_strings> // Calls cmp_strings(), probably with %eax and %esp as argument strings
 8048bc7:   test   %eax,%eax             // %eax is the returned value
 8048bc9:   je     8048bd0 <test_string+0x20>  // Should we jump to alarm()?
 8048bcb:   call   804938c <alarm>       // If we reach here, I flunk
 8048bd0:   add    $0x1c,%esp            // restores %esp to original value
 8048bd3:   ret                          // exits

So... if I'm right, Line #2 is the important one here. I suspect the mystery string is stored in memory address $0x804a46c. But I'm not certain. I also note that when I use the strings tool, I see this:

[linux]$ strings -t x CODE | grep 46c
   246c My dog has fleas.
[linux]$

That's promising... but not convincing. Memory address $0x804a46c is not 246c.

So... apologies for the lengthy post, but can folks tell me if I'm on the right track? Any insight or wisdom is wildly appreciated!

Many thanks! -RAO

Pete
  • 1,511
  • 2
  • 26
  • 49
  • 3
    Use `objdump` to see the string at the given address. `strings` gives you file offsets, not virtual addresses. You can of course also translate those, if you look at the section headers, again using `objdump`. – Jester Oct 26 '16 at 16:02
  • 2
    The "mystery opcode" at address `8048bba` is just part of the previous instruction. It has probably been included on the next line because the instruction encoding is so long. Note that the value in the instruction is `0x804a46c`, and the `6c` `a4` and `04` are all listed on the preceding line. – davmac Oct 26 '16 at 19:35
  • 1
    @davmac: yup, exactly right. I usually use `objdump -drwC` to avoid that (`-w` means "wide", and puts all the bytes for an instruction on the same line, regardless of column width). – Peter Cordes Oct 27 '16 at 01:22
  • @davmac Ohhhhhh... that makes a world of sense. Thanks! – Pete Oct 27 '16 at 14:40

2 Answers2

3

Unless there is some anti-debug trickery going on, cmp_strings() only accepts two arguments which are both given inside test_string(). Naturally, both of them are strings, and the first string is taken from a constant location 0x804a46c, whereas the second (a pointer to it, of course, not the string itself) is a parameter to test_string(). Immediately before the call, the stack looks like this:

     |_______________|
ESP: | <your string> | <-- cmp_strings() 1st arg
+04: |   0x804a46c   | <-- cmp_strings() 2nd arg
+08: |      ...      |
+0C: |      ...      |
+10: |      ...      |
+14: |      ...      |
+18: |      ...      |
+1C: | return adress | <-- ESP at the start of test_string()
+20: | <your string> | <-- test_string() 1st arg
+24: |      ...      |

You can check the «secret» string contents directly at runtime using GDB (which, in general, is necessary, as the code not shown here may rewrite it). Just break *0x8048bc2, run and then x/sb 0x804a46c.

hidefromkgb
  • 5,834
  • 1
  • 13
  • 44
  • 1
    Yep, that did it! x/sb 0x804a46c "My dog has fleas." Perfect, thank you! You've also given me a good toehold on the GDB x command, which I foresee using a lot in the near future. Thanks! – Pete Oct 27 '16 at 14:48
  • 1
    @Pete: by the way, GDB can be more interactive than it is by default. Use `gdb --tui`, and when it starts, type `layout asm` to see the assembly. – hidefromkgb Oct 27 '16 at 15:13
2

The next line moves the pointer to %esp, then calls test_string().

mov %eax,(%esp) stores value in eax to the memory addressed by esp, ie. at the top of the stack. To copy that pointer into esp you would have to do mov %eax, %esp and that's not a good idea, as ss:esp is used as stack pointer by the CPU.

movl $0x804a46c,0x4(%esp) // test_string is stored at $0x804a46c; move that pointer into %esp

Again the "into esp" is inaccurate at the level of being completely wrong. This writes value 0x804a46c into memory at address esp+4, so if you would pop values from stack, it would be the second value popped (right "under" the top of the stack).

mov 0x20(%esp),%eax // Moves test_string ptr to %eax

Loads "input string pointer" into eax. That's the one from eax ahead of call <test_string>. You probably meant that, and wrote wrong comment?

mov %eax,(%esp) // Moves test_string ptr to %esp - not sure why...?

Stores it at the "top of the stack", so if you would start to pop values from stack here, you would first pop the input string pointer, and then that 0x804a46c value. See answer of hidefromkgb for the ASCII art of stack content.

Then it's very likely that call 8049084 <cmp_strings> picks those two pointers from stack as arguments, does something, and returns zero for the correct string (as any non-zero return value will make next je fail, and trigger call <alarm>.

You should probably take a quick look at cmp_strings too, to see if it's ordinary C-like strcmp or how it can return zero.

And as Jester pointed out, it should be possible to objdump also that mystic 0x804a46c content. If it's some early task, it will probably belong to data section with easily readable string data.

If this would be more difficult task, it can as well point into code segment at fake instructions which form some string .. or eventually not even fake instructions (although to produce meaningful asm code which forms also some short string is not trivial on x86 ... for example I used to add "PED" to the start of my 256B intros .com files, it's just messing a bit the stack, not affecting the rest of my intro ... and in one size coding competition I used xlat pointing into the code to get wanted bit pattern to draw Greece flag in 51 bytes).

Ped7g
  • 16,236
  • 3
  • 26
  • 63
  • Thanks Ped7g, this is great detail. I'm learning slowly and painfully that while IA-32 syntax looks simple, its really difficult to follow and easy to misinterpret. I've copied all of your comments into my notes. :) – Pete Oct 27 '16 at 14:46
  • @Pete I personally strongly prefer the Intel syntax (especially the NASM variant). But I'm afraid you are not completely free to choose. Although objdump can be configured to produce Intel syntax GNU dialect: http://stackoverflow.com/a/10362655/4271923 But make sure it will not bite you later in class, when you will be enforced to AT&T anyway? Also if you are learning this as first ASM ever, maybe AT&T will not hurt that much. (I was learning x86 after months/years? of coding on other CPU, and Intel syntax was looking familiar) – Ped7g Oct 27 '16 at 14:57
  • Hmmmmmm... good food for thought. Honestly, I'm a complete newbie to IA-32 and only learned about objdump two days ago. I'm working with the syntax that is put in front of me, not really understanding where it comes from. I'll have to put more thought into this. Thanks! – Pete Oct 27 '16 at 15:00