Binary bomb phase 4 assembly

Question

I have a very similar question as "Binary Bomb - Phase 4" but it is still different enough that I'm not entirely sure what to do.

Here is my phase_4 code:

 08048d3e <phase_4>:
 8048d3e:       83 ec 2c                sub    $0x2c,%esp
 8048d41:       8d 44 24 18             lea    0x18(%esp),%eax
 8048d45:       89 44 24 0c             mov    %eax,0xc(%esp)
 8048d49:       8d 44 24 1c             lea    0x1c(%esp),%eax
 8048d4d:       89 44 24 08             mov    %eax,0x8(%esp)
 8048d51:       c7 44 24 04 75 a7 04    movl   $0x804a775,0x4(%esp)
 8048d58:       08
 8048d59:       8b 44 24 30             mov    0x30(%esp),%eax
 8048d5d:       89 04 24                mov    %eax,(%esp)
 8048d60:       e8 6b fb ff ff          call   80488d0 <__isoc99_sscanf@plt>
 8048d65:       83 f8 02                cmp    $0x2,%eax //making sure I have 2 inputs
 8048d68:       75 0e                   jne    8048d78 <phase_4+0x3a> //if not explodes bomb
 8048d6a:       8b 44 24 18             mov    0x18(%esp),%eax
 8048d6e:       83 f8 01                cmp    $0x1,%eax //has to be greater than 1
 8048d71:       7e 05                   jle    8048d78 <phase_4+0x3a> //otherwise jumps to bomb
 8048d73:       83 f8 04                cmp    $0x4,%eax
 8048d76:       7e 05                   jle    8048d7d <phase_4+0x3f> //has to be less than 4 or jumps to bomb
 8048d78:       e8 af 05 00 00          call   804932c <explode_bomb>
 8048d7d:       8b 44 24 18             mov    0x18(%esp),%eax
 8048d81:       89 44 24 04             mov    %eax,0x4(%esp)
 8048d85:       c7 04 24 09 00 00 00    movl   $0x9,(%esp)
 8048d8c:       e8 50 ff ff ff          call   8048ce1 <func4> //calls function 4
 8048d91:       3b 44 24 1c             cmp    0x1c(%esp),%eax
 8048d95:       74 05                   je     8048d9c <phase_4+0x5e> //compares two values and explodes bomb if not equal
 8048d97:       e8 90 05 00 00          call   804932c <explode_bomb>
 8048d9c:       83 c4 2c                add    $0x2c,%esp
 8048d9f:       90                      nop
 8048da0:       c3                      ret

And here is func_4 code:

 08048ce1 <func4>: //not entirely sure what's happening here but it might be a binary search?
 8048ce1:       83 ec 1c                sub    $0x1c,%esp
 8048ce4:       89 5c 24 10             mov    %ebx,0x10(%esp)
 8048ce8:       89 74 24 14             mov    %esi,0x14(%esp)
 8048cec:       89 7c 24 18             mov    %edi,0x18(%esp)
 8048cf0:       8b 74 24 20             mov    0x20(%esp),%esi
 8048cf4:       8b 5c 24 24             mov    0x24(%esp),%ebx
 8048cf8:       85 f6                   test   %esi,%esi
 8048cfa:       7e 2b                   jle    8048d27 <func4+0x46>
 8048cfc:       83 fe 01                cmp    $0x1,%esi
 8048cff:       74 2b                   je     8048d2c <func4+0x4b>
 8048d01:       89 5c 24 04             mov    %ebx,0x4(%esp)
 8048d05:       8d 46 ff                lea    -0x1(%esi),%eax
 8048d08:       89 04 24                mov    %eax,(%esp)
 8048d0b:       e8 d1 ff ff ff          call   8048ce1 <func4>
 8048d10:       8d 3c 18                lea    (%eax,%ebx,1),%edi
 8048d13:       89 5c 24 04             mov    %ebx,0x4(%esp)
 8048d17:       83 ee 02                sub    $0x2,%esi
 8048d1a:       89 34 24                mov    %esi,(%esp)
 8048d1d:       e8 bf ff ff ff          call   8048ce1 <func4>
 8048d22:       8d 1c 07                lea    (%edi,%eax,1),%ebx
 8048d25:       eb 05                   jmp    8048d2c <func4+0x4b>
 8048d27:       bb 00 00 00 00          mov    $0x0,%ebx
 8048d2c:       89 d8                   mov    %ebx,%eax
 8048d2e:       8b 5c 24 10             mov    0x10(%esp),%ebx
 8048d32:       8b 74 24 14             mov    0x14(%esp),%esi
 8048d36:       8b 7c 24 18             mov    0x18(%esp),%edi
 8048d3a:       83 c4 1c                add    $0x1c,%esp
 8048d3d:       c3                      ret

I checked to make sure that the input must be two decimals, and I can also see that at the end two numbers are being compared with one another (line 8048d97, 0x1c(%esp) and %eax). At the beginning of phase_4 I think the code is also indicating that the first number has to be between 1 and 4, and at the end of phase 4, however the number has been modified, it must equal the second number. Please correct me if I'm wrong.

I'm just not sure what the func_4 is doing, and how to determine what the inputs should be. I think it might be the binary search, but not sure how to check how it corresponds with the first input.

Between 1 and 4? Why? I see `jle` there. It would made it lot more easier for me, if you would interleave the code with your comments, what you think those few lines of code do (so I wouldn't not have to count myself what is `0x24(%esp)` in fn4. And search the whole code going up and down for almost every jump address, whether that one leads to explode, or defuse, adding some labels... It would also show more effort on your side. Besides that, did you consider typing this into actual machine and watch with debugger? Or is it considered "cheating" for bomb lab? Then asking at SO must be too. :D — Ped7g, Nov 10 '16 at 13:24
Just put in the comments! Sorry! And I did use the debugger, but no matter how it jumps through function 4, it keeps going to the explode bomb step but I can't follow how because I'm not very good at deciphering Assembly code. — Red Icing, Nov 11 '16 at 01:39
I extended my answer with "sample" comments of `func4`. Try to get trough it, and let me know if it helps. "not good at deciphering" = takes some time, but generally there are only so many instructions of x86 CPU, and each does only very simple thing to it's state, when you limit yourself to 80386 instruction set, it's actually much easier to learn ASM than any high level language (in terms of intructions/keywords). It's the amount of instructions you need to do something useful, which makes programming in ASM hard, not the instructions themselves. Those are easy. — Ped7g, Nov 11 '16 at 02:50

Ped7g · Accepted Answer · 2016-11-11T02:57:49.183

what the func_4 is doing, and how to determine what the inputs should be.

 8048d81:       89 44 24 04             mov    %eax,0x4(%esp)
     ; input value loaded to [esp+4] -> [esp+0x24] inside func4
 8048d85:       c7 04 24 09 00 00 00    movl   $0x9,(%esp)
     ; 9 stored to [esp+0] -> [esp+0x20] inside func4
 8048d8c:       e8 50 ff ff ff          call   8048ce1 <func4>
     ; something calculated
     ; then the result is compared with other input value, they should be equal

I'm not sure, which input is which (you know the format string for sscanf and your ABI, so you can tell better, the one stored into [esp+0x18] and tested to be of value 2, 3 or 4 is used in calculation, the one stored into [esp+0x1c] is just compared. I would guess that input for calculation is second (for sscanf at +0x0c, the other one is at +0x08)? So password is "<func4(9,2-4)> <2-4>"?

As that func4 is clean asm recursive code, you can just run it for all three possible inputs to see what result it produces (in debugger, stepping over instructions, so you manage to catch if some parameter leads to infinite loop or some other nastiness, plus you will get idea how it works).

Then figure out which input is which.

not sure what the func_4 is doing

Hey, that's how assembly works. You rarely take a look on stream of instructions, and recognize the algorithm from them with a quick glance. More often you need to thoroughly emulate CPU state in your head instruction by instruction, paying attention to every detail (like flag change by instruction which looks to be used only for setting up a value, then few instructions later that preserved flag is used in conditional jump), and by keeping some timeline of calculated values it's usually possible to guess the algorithm implemented.

Or if not algorithm, then at least to know what values are calculated.

I checked the code one more time, and it looks like variation on Fibonnaci theme, like func4(0,x) = 0, func4(1,x) = x, and func4(y, x) = (just quick guess, too lazy to verify I got it right) x + func4(y-1, x) + func4(y-2, x).

edit: I'm now sure that formula is not correct, it's missing at least one constant (but maybe it's wrong even more).

So I wonder, are you too lazy to bother to emulate the CPU thoroughly and get to the correct result, or you have problem with some particular instruction, what it exactly does (ie. you are too lazy to read the instruction reference guide)? So it boils down to "are you lazy or lazy?" Because I'm definitely both, so this is my answer to your question. :)

In case you don't understand some particular detail about some instruction, just ask.

edit about comments:

"//has to be less than 4 or jumps to bomb"

No, it's jle, that's mnemonics for "jump signed less/equal than", so the correct comment is "has to be less than or equal 4" and that's valid for values [TYPE_MIN, 4].

For "less than 4" you would need to use jl = "jump signed less".

For "less than 4" unsigned there's jb = "jump below", that covers values [0, 3]. jbe covers [0, 4] values.

If you will check Jcc description, you will see those conditional jumps are based only on few flags in the flag register, nothing more (except jcxz/jecxz, which compares cx register and doesn't touch flags at all).

Also you may notice there are several aliases, so you can write in your code the one which semantically fits your purpose. For example jz and je is the same instruction, first alias stands for "jump when zero (flag)", the other is "jump when equal". So read through it and get used to those abbreviations, it will make the reading of disassembly somewhat easier.

//calls function 4

That's useless comment, that's obvious from the instruction itself. Would you add for example arguments (9, input_2), it would be of more benefit to the reader.

The next "compare" comment is of similar useless nature. What two numbers? If you already did track them down, you should write that result into comment, so something like "compares result of func4 with input_1".

I will try to show you as example the func4:

 8048ce1 <func4>:
  ; allocates 0x1c bytes on stack for local variables
  ; (so return address is at [esp+0x1c] now, was at [esp])
 8048ce1:       83 ec 1c                sub    $0x1c,%esp
  ; stores current ebx, esi and edi in [esp+0x10 / 0x14 / 0x18]
 8048ce4:       89 5c 24 10             mov    %ebx,0x10(%esp)
 8048ce8:       89 74 24 14             mov    %esi,0x14(%esp)
 8048cec:       89 7c 24 18             mov    %edi,0x18(%esp)
  ; esi = [esp+0x20] = first argument of func4(arg1, arg2)
 8048cf0:       8b 74 24 20             mov    0x20(%esp),%esi
  ; ebx = [esp+0x24] = second argument of func4(arg1, arg2)
 8048cf4:       8b 5c 24 24             mov    0x24(%esp),%ebx
  ; when esi(arg1) <= 0, jump to ExitWith0
 8048cf8:       85 f6                   test   %esi,%esi
 8048cfa:       7e 2b                   jle    8048d27 ExitWith0
  ; when esi(arg1) == 1, jump to ExitWithValueFromEbx
 8048cfc:       83 fe 01                cmp    $0x1,%esi
 8048cff:       74 2b                   je     8048d2c ExitWithValueFromEbx
  ; store arg2 in [esp+4]
 8048d01:       89 5c 24 04             mov    %ebx,0x4(%esp)
  ; store (arg1-1) in [esp] (preparing args for recursive call)
 8048d05:       8d 46 ff                lea    -0x1(%esi),%eax
 8048d08:       89 04 24                mov    %eax,(%esp)
  ; recursion: eax = func4(arg1-1, arg2) ; ebx/esi/edi preserved
 8048d0b:       e8 d1 ff ff ff          call   8048ce1 <func4>
  ; edi = eax (result) + ebx*1 (arg2)
 8048d10:       8d 3c 18                lea    (%eax,%ebx,1),%edi
  ; [esp+4] is set to arg2 again
  ; (it's still there, but this asm looks like debug level of C, not very efficient)
 8048d13:       89 5c 24 04             mov    %ebx,0x4(%esp)
  ; esi -= 2 (no more original arg1 in esi), and set [esp+0]
 8048d17:       83 ee 02                sub    $0x2,%esi
 8048d1a:       89 34 24                mov    %esi,(%esp)
  ; second recursive call: func4(arg1-2, arg2)
 8048d1d:       e8 bf ff ff ff          call   8048ce1 <func4>
  ; ebx = edi + eax*1 ; edi was arg2 + f4(arg1-1, arg2)
 8048d22:       8d 1c 07                lea    (%edi,%eax,1),%ebx
  ; so ebx = arg2 + f4(arg1-1, arg2) + f4(arg1-2, arg2), return that value
 8048d25:       eb 05                   jmp    8048d2c ExitWithValueFromEbx
ExitWith0:
 8048d27:       bb 00 00 00 00          mov    $0x0,%ebx
ExitWithValueFromEbx:
 8048d2c:       89 d8                   mov    %ebx,%eax
  ; restore values of ebx/esi/edi (return value is in eax)
 8048d2e:       8b 5c 24 10             mov    0x10(%esp),%ebx
 8048d32:       8b 74 24 14             mov    0x14(%esp),%esi
 8048d36:       8b 7c 24 18             mov    0x18(%esp),%edi
  ; restore esp value, so [esp] is return address, and return
 8048d3a:       83 c4 1c                add    $0x1c,%esp
 8048d3d:       c3                      ret

So I actually guessed in correctly in my answer, those ,1) in lea confused me, I'm not used to AT&T syntax, so at a weaker moment I thought that's +1 to result, but it's *1 to index register (I'm used to Intel syntax, where it would look as lea ebx,[edi+eax]).

But as you can see, once you start to take down the notes, and focus on single instruction only, I did manage to decipher it correctly this time.

At the moment it's important for you to make sure you understand every instruction, I mean every detail, like how that lea works and why it doesn't read memory value, even if the argument is (...), and what are all possible addressing modes (how would it look if you would want to do the f(a, b) = 1 + b + f(a-1,b) + f(a-2,b)? Try to find that one, just one lea has to be modified (any of those two)).

I'm not sure what literature you have for this, as I use Intel syntax everything (I think AT&T syntax is good for C compilers, but not so much for humans ... but overall it's not that bad, mostly annoying if you already know the other one, if you know only AT&T, it may be quite OK).

In worst case ask, although questions about instructions purpose are "low effort", as all the x86 manuals are freely available, but if you are not sure what some wording really means, and running such instruction few times in debugger doesn't help, you have to ask. Just add what you think and what words are confusing you.

Definitely lazy for sure, but I've been trying to step through this code for 3 days now so I'm probably just not understand fundamental guidelines, probably. And this project has been given such that if I emulate a CPU the bomb explodes. I'll try inputting the three different values though, that may be the way to do it. Thanks. — Red Icing, Nov 11 '16 at 01:42
@SnigdaaSethuram I meant to "emulate" it in your head. You have the listing, so you can execute every instruction in your mind (very likely with help of paper to keep the "CPU" state somewhere :)). Although doing the full `func4` recursion for depth `9` would be tedious job. If you actually can run that code in gdb, it's much easier to just put breakpoint after `call` and step into it for few iterations just to take a peek how it works, then let it run till breakpoint and check the result value. — Ped7g, Nov 11 '16 at 02:06
Update: It seems the second input has to be between 2 and 4 inclusive. I've also tested all three values for the second input, but for all three I am getting the value of 9 for %esp at the end, which makes sense because there's a line that moves 9 into that register. What I don't understand then is how the first input is even relevant, and how we'll ever get esp to match eax because none of the 3 inputs give me 9 for the value of %esp at the end. — Red Icing, Nov 11 '16 at 02:52
`mov $9,(%esp)` is not moving `9` into `esp`. It's moving `9` into memory at address, which is pointed to by value in `esp`. Try to find something about stack. Or first about memory read/write, as you didn't notice the `()` around `esp`. That's how AT&T notes memory access with some instructions (not all of them, where only memory access is possible, `()` are omitted!) ... In Intel syntax and in my comment I use `[address]` to mark memory access. — Ped7g, Nov 11 '16 at 03:00
@SnigdaaSethuram dump in gdb register values (take a note of value in `esp`), and then dump memory content starting from `esp`... After that `mov $9` that memory should contain "dword 9, , esp+0x1c (from sscanf call), ...". And `esp` is not ordinary general purpose register, it's used as stack pointer, instructions `call`, `ret`, `push`, `pop`, and others do work with it (and modify it!). — Ped7g, Nov 11 '16 at 03:04
When I use the i r method in gdb and it gives me the values of all the registers, etc., how do I see what esp is pointing to? It give me the hex address to which I apply x/d "address", but that will just give me the memory at the address and not the value I want. — Red Icing, Nov 11 '16 at 03:19
@SnigdaaSethuram I don't use gdb, but your question sounds weird. "memory at address - not value"? I don't understand what you mean. `x/8xw $esp` should display 8 values from stack (in hexadecimal). Like `0xffffccec: 0xf7e0472e 0x00000001 ...` (first "number:" is address of memory (== value of `esp`), followed by values stored in memory (it's content). If I do `i r`, esp is indeed 0xffffccec, like in that memory dump. So `0xffffccec` == `esp`, `0xf7e0472e` == `[esp+0]`, `0x00000001` == `[esp+4]`. — Ped7g, Nov 11 '16 at 05:27
@SnigdaaSethuram this may help you visualize the memory https://youtu.be/vcfQVwtoyHY (sorry for video, can't find any simple text tutorial) - "stack" is just piece of memory pointed at by value in `esp`, it's not special in any way. The `esp` register itself is special - by how some instructions work with it, like for example `ret` is working like: `mov (%esp),%eip` `add $4,%esp` (but it's atomic single opcode) ... thinking about it, simple `ret` is actually alias for `pop %eip` (it's not legal mnemonics, as non-simple `ret` can do more, as explained in inst. reference guide - so it's "ret"). — Ped7g, Nov 11 '16 at 05:54
Thank you! I figured it out, I just grossly missed the point of what exactly esp was. Using input: "176 2" worked. — Red Icing, Nov 11 '16 at 18:59

Binary bomb phase 4 assembly

1 Answers1