How exactly DO lines of code relate to each other in assembly code when jump is involved?

Question

Okay, so I understand what mov means, I understand what the registers are, I understand what the operation commands. I even understand that the leftmost hexadecimal is the instruction's number. For example, on line 7, the hexadecimal 7f is instruction jg. FINE.

What I don't get is HOW EXACTLY these facts add up, and its incredibly frustrating.

What I know so far:

Like for example, on line 1 does 0d add to line 804839c? No, it jumps to line 17 because 0d is the instruction AFTER line 1. If you add 0d the address 804839e, you get 80483a7. GOOD.

Does this mean that all instructions for the next line are relative to the second 2 bit hexadecimal?

Does that mean the leftmost hexadecimal is the current line's instruction?

I just need a little more direction, I am so close to figuring this out that I can almost taste it.

1 804839c: 7e 0d      jle   80483ab <silly+0x17>
2 804839e: 89 d0      mov   %edx,%eax
3 80483a0: d1 f8      sar   %eax
4 80483a2: 29 c2      sub   %eax,%edx
5 80483a4: 8d 14 52   lea   (%edx,%edx,2),%edx
6 80483a7: 85 d2      test  %edx,%edx
7 80483a9: 7f f3      jg    804839e <silly+0xa>
8 80483ab: 89 d0      mov   %edx,%eax

Yes the offset is [relative to the address immediately after the jump](https://stackoverflow.com/q/14889643/555045) (maybe dupe? but I don't understand the rest of the question) — harold, Oct 20 '18 at 20:18
If 0d is relative to the address AFTER the jump, does that mean 7e is relative to the address the jump is on? And does that rule apply to all operations? — The_Senate, Oct 20 '18 at 20:48
That 7e just means `jle`, it's not relative. The rule that PC-relative offsets are relative to the address after the instruction also extends to for example `call rel32` and 64bit RIP-relative addressing — harold, Oct 20 '18 at 20:58
Right, right. So, this means that on line 2 89 is mov, and d0 is going to the next line, yes? — The_Senate, Oct 20 '18 at 21:14
`89` is a particular `mov` (there's a whole bunch of them that are different), `D0` is the [ModRM byte](http://www.sandpile.org/x86/opc_rm.htm) of that `mov` and encodes the `edx` and `eax` — harold, Oct 20 '18 at 21:17
if that's the case, then does that just mean we move to the next line and d1 is sar while f8 is the ModRM byte of that sar that encodes %eax? So, not all hexes have to do with the next line and we can just jump down to the next line after the previous operand is completed? — The_Senate, Oct 20 '18 at 21:20
Most instructions don't explicitly jump anywhere, so you just go to the next one. The `sar` is a third slightly different case, the `d1` does not mean `sar` exactly but "rotate/shift by 1, not 8bit" and then the R field of the ModRM encodes which kind of shift it is — harold, Oct 20 '18 at 21:33
if you wanna write disassembler there's some old stuff: http://phg.chat.ru/opcode.txt — Алексей Неудачин, Oct 28 '18 at 13:16

paxdiablo · Answer 1 · 2018-10-22T03:01:08.693

The relative offsets used in jump instructions⁽¹⁾ can be best understood as follows: the offset is simply something to add (it's a signed value so you can jump forwards or backwards, within a limitied range) to the program counter to get the new program counter.

But the important thing to keep in mind here is that the program counter (that you add the offset to) is the location of the instruction after the jump. I always remember this by thinking that the CPU has already advanced the program counter to the next location in anticipation of getting the next instruction⁽²⁾.

That's important. As per your sample code (irrelevant stuff removed):

1   804839c: 7e 0d      jle   80483ab <silly+0x17>
2   804839e: 89 d0      mov   %edx, %eax
3-7                     blah  blah, blah
8   80483ab: 89 d0      mov   %edx, %eax

The offset 0d is added to the location of line two, 804839e, to get the jump target of line eight, 80483ab.

⁽¹⁾ Not all jump instructions are relative. It's just that you've chosen a short-form one for your question, the opcode 7e. You could also choose the near-form 0f 8e. I don't think far-form variants of the conditional jumps exist, you instead emulate these by reversing the sense of the comparison, such as with:

jle  farPoint    -->          jg   noJump
blah blah, blah               jmp  farPoint
                      noJump: blah blah, blah

⁽²⁾ Because that's how it was done in the days when I started cutting raw code for CPUs. With today's pipelining, speculative execution, and so on, I'm not so sure.

Yes, RIP=end of this instruction=start of next, during execution of an instruction, for the purposes of RIP-relative addressing and branch displacements. Whether or not there's a physical RIP (there isn't) is irrelevant: logically this is how x86 works, and the out-of-order machinery has to continue to implement that behaviour. Your description of old non-pipelined CPUs is a good way to remember it, because it is the reason for it in the first place. Fortunately it's nice and simple, unlike ARM which exposes PC as one of the general-purpose registers and has PC = 2 instructions later. — Peter Cordes, Oct 28 '18 at 17:50

score 1 · Accepted Answer · edited Jul 02 '23 at 10:23

If you are confused about the opcode you are a long way from understanding this. You need to start with documentation on the instruction set. For x86 this is plentiful; it's not great documentation, but still the opcodes are pretty clear. With instruction sets like this, it's not hard to find a web page with a chart of opcodes and then you click on that to find the rest of the instruction definition.

Fairly typical that the relative address is based on the byte after the instruction. If you were working on a team for a brand new processor, then you would just go down to one of the chip folks cubes and ask (since it wouldn't be well documented yet) but since this is an old design there are tools available that will simply give you your answer without asking anyone else.

Try this:

a0: jle a0
a1: jle a1
a2: jle a2
a3: jle a3
a4: jle a4

b0: jle b1
b1: jle b2
b2: jle b3
b3: jle b4
b4: jle b5
b5: nop

c0: jle c0
c1: jle c0
c2: jle c0
c3: jle c0
c4: jle c0

d0: jle d4
d1: jle d4
d2: jle d4
d3: jle d4
d4: jle d4

Assemble and disassemble:

0000000000000000 <a0>:
   0:   7e fe                   jle    0 <a0>
0000000000000002 <a1>:
   2:   7e fe                   jle    2 <a1>
0000000000000004 <a2>:
   4:   7e fe                   jle    4 <a2>
0000000000000006 <a3>:
   6:   7e fe                   jle    6 <a3>
0000000000000008 <a4>:
   8:   7e fe                   jle    8 <a4>
000000000000000a <b0>:
   a:   7e 00                   jle    c <b1>
000000000000000c <b1>:
   c:   7e 00                   jle    e <b2>
000000000000000e <b2>:
   e:   7e 00                   jle    10 <b3>
0000000000000010 <b3>:
  10:   7e 00                   jle    12 <b4>
0000000000000012 <b4>:
  12:   7e 00                   jle    14 <b5>
0000000000000014 <b5>:
  14:   90                      nop
0000000000000015 <c0>:
  15:   7e fe                   jle    15 <c0>
0000000000000017 <c1>:
  17:   7e fc                   jle    15 <c0>
0000000000000019 <c2>:
  19:   7e fa                   jle    15 <c0>
000000000000001b <c3>:
  1b:   7e f8                   jle    15 <c0>
000000000000001d <c4>:
  1d:   7e f6                   jle    15 <c0>
000000000000001f <d0>:
  1f:   7e 06                   jle    27 <d4>
0000000000000021 <d1>:
  21:   7e 04                   jle    27 <d4>
0000000000000023 <d2>:
  23:   7e 02                   jle    27 <d4>
0000000000000025 <d3>:
  25:   7e 00                   jle    27 <d4>
0000000000000027 <d4>:
  27:   7e fe                   jle    27 <d4>

Without having to look at the documentation it looks pretty clear that 0x7E is an opcode and the byte after is a pc relative offset. The 0xFE on the first items implies that it is a signed offset and relative to the byte after the instruction. The remaining experiments confirm that.

This doesn't mean you should assume that all jump/branch instructions work this way for this instruction set, you can do similar experiments with tools that are known to produce working code.

This is one area where processor documentation is lacking and you usually need to 1) talk to the silicon engineers if you can 2) look at the chip design (source code) 3) documentation 4) experiment with existing tools 5) experiment with the hardware

Most folks don't have access to 1 and 2. Often 3 and 4 are available if you actually have one of these processors and usually to get to 5 you have 3 and you probably have access to 4 but sometimes not. But again the documentation often leaves the relative address unknown, usually it is the byte after the instruction, but like in ARM it is a fixed offset from the address of the instruction, the illusion of a specific pipeline.

804839c: 7e 0d      jle   80483ab <silly+0x17>

804839c is the address of the jle instruction yes. 80483ab is the address it will branch to if the condition is met. ab-9c = 0xf = 0xD + 2. 2 is the size of the instruction, 0xD is the offset/immediate in the instruction.

I would assume the other conditional branches of this form (notice the jg later in your code) are an opcode byte and a signed offset byte. But you should always check before making your own assembler or disassembler or simulator. Start with the docs, and confirm with any tools you can find that are known to work for that platform.

It's funny how you are literally the first one to actually answer this question, but you arrived far too late. I already figured all of this out. — The_Senate, Nov 25 '18 at 11:56
Also, I disagree. I was not far away from figuring it out, I was asking good questions. The book just throws assembly down with little to no explanation as to what these numbers mean or do, so I went looking for answers. I found them. — The_Senate, Nov 25 '18 at 11:58

How exactly DO lines of code relate to each other in assembly code when jump is involved?

2 Answers2