13

As I use objdump -D to disassemble a binary, the typical code of jmpq is like e9 7f fe ff ff, which is used for representing a negative offset. However, the address of x86-64 is 64(48)-bit (to my knowledge), so how can this 32-bit address 7f fe ff ff represent the negative offset of 64-bit absolute address?

Additionally, are there any other instructions like jmp and jmpq, but have 64-bit address displacement? How can I find the instructions in Intel's or AMD's manual (I searched for jmpq but found nothing)?


As I searched, it seems to be called RIP-relative addressing. And it seems that not all instructions do this. Is there 64-bit relative addressing? If it is an indirect jump, the 64-bit absolute address would be in a register or memory, right?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
WindChaser
  • 960
  • 1
  • 10
  • 30
  • The jump instruction has a 32-bit offset version. If the offset fits in 32-bits, you can use the 32-bit offset version and save a few bytes (over a 64-bit absolute address or something else). – David Schwartz Nov 16 '14 at 08:49
  • Thank you for the comment. What is the instruction of 64-bit offset version? – WindChaser Nov 16 '14 at 08:51
  • 1
    I don't believe there is a 64-bit offset version. – David Schwartz Nov 16 '14 at 09:06
  • 2
    `objdump` by default uses AT&T syntax, such as `jmpq`. Intel uses Intel syntax, so the mnemonic of the instruction is `jmp` (Intel syntax uses no suffixes). With `objdump` you can use `-M intel` to get the disassembly in Intel syntax. – nrz Nov 16 '14 at 11:59
  • 4
    According to [x86_64](http://www.x86-64.org/documentation/assembly.html): *Immediates*: "Immediate values inside instructions remain 32 bits and their value is sign extended to 64 bits before calculation. (...) **Only exception** from this rule are the moves of constant to registers that have 64bit form". Don't know the reason but possibly because most jumps are in 4GB range – phuclv Nov 16 '14 at 12:55
  • @LưuVĩnhPhúc, good point, 64-bit absolute addresses are only possible with `mov` and then only with AL, AX, EAX or RAX as source or destination. – Z boson Nov 17 '14 at 09:02

3 Answers3

8

As others have noted, the "jmp relative" instruction for x86-64 is limited to a 32 bit signed displacement, used as a relative offset with respect to the program counter.

OP asked why there is no relative jump with a 64 bit offset. I can't speak for the designers at Intel, but it seems pretty clear that this instruction would simply not be very useful, especially with the availability of the 32-bit relative jmp. The only time it would be needed is when your program was 2+ gigabytes in size, so that the 32 bit relative jmp could not reach all of it from any point within it. Seen any 2Gb object files recently? So the apparent utility for such instructions seems really small.

Mostly when programs get really large, they start to be broken into more manageable elements that can evolve at different rates. (DLLs are an example of this). Interfacing between such elements is done by more arcane means (jump vectors, etc) to ensure that the interfaces stay constant in the face of evolution. An extremely-long jmp relative could be used to reach from an application to an entry point in another module, but the actual cost of loading an absolute address into a register and doing an register-indirect call, is small enough in practice that it isn't worth optimizing. And modern CPU design is all about optimizing where you put your transistors to maximize performance.

Just to be complete, the x86 (many flavors) have very short jmp relative instructions (8 bit signed offset), too. In practice, even the 32 bit jmp relative instructions are rarely needed, especially if you have a good code generator that can rearrange code blocks. Intel arguably could have left these out for the same reason; I suspect their utility is marginally high enough to justify the transistors.

The question of "big literal operands" shows up in funny ways in many architectures. If you examine the distribution of literal values in code, you'll discover that small values (0,1, ascii character codes) cover a pretty good percentage; almost everything else are memory addresses. So you kind of don't need "big literal values" in programs but you do have to handle memory addresses somehow. The Sparc chip famously has "load literal value low into register" (meaning "small constants") and less often used "load literal value high" (to fill upper bits in a register) used as a second instruction to make big constants, and used less often. This keeps the code small, except when you need a big constant; small code means higher effective instruction fetch rates and that contributes to performance.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • MIPS has `lui` to load high 16-bit part too. The lower part can be loaded by `ori` or `addiu`. ARM can load 8-bit immediates with even rotation values but newer versions can also load high and low 16-bit parts separately – phuclv Nov 17 '14 at 10:25
  • "The designers at Intel" wouldn't be able to justify the signed 32-bit displacement of near jumps in 64-bit code segments, because the 64-bit extensions originated with AMD64, so the AMD designers decided on this. – ecm Aug 21 '21 at 10:54
7

The E9 opcode in 64 bit mode take a 32 bit sign displacement sign extended to 64 bits:

E9 cd -> JMP rel32 ->Jump near, relative, RIP = RIP + 32-bit displacement sign extended to 64-bits

The FF opcode can be used to jump to a 64 bit address:

FF /4 -> JMP r/m64 -> Jump near, absolute indirect, RIP = 64-Bit offset from register or memory

Quotes taken from the Intel instruction set manual entry for JMP.

Z boson
  • 32,619
  • 11
  • 123
  • 226
Craig S. Anderson
  • 6,966
  • 4
  • 33
  • 46
  • 2
    What does "/4" mean? And if the first bit is 1 (the first byte ranges from "8" to "f", the offset would be automatically extended, like "ff ff ff ff ff 12 34 56")? – WindChaser Nov 16 '14 at 18:49
  • It's good that your reference the Intel Manual but could you please explain for those that are not familiar with the terminology in the manual what `cd` and `/4` mean. – Z boson Nov 17 '14 at 09:54
  • The /4 does mean "opcode extension". See my post here: https://stackoverflow.com/questions/26824941/opcode-ff-4-which-value-is-give-to-eip/26830328#26830328 – zx485 Nov 17 '14 at 13:56
  • @zx485, okay, what does `/5` mean then? – Z boson Nov 17 '14 at 14:24
  • _cd_ is a 4 byte value that is used as a code offset. A code offset is added to another value (In this case the RIP) to produce a new RIP. – Craig S. Anderson Nov 18 '14 at 03:23
  • 1
    It means that REG/OPCODE = 101b = 5 ; /5 – zx485 Nov 22 '14 at 02:39
  • For future reference, Q&As about opcodes that use ModRM's /r field as extra opcode bits: [How to read the Intel Opcode notation](https://stackoverflow.com/a/53976236) / [What does the /4 mean in FF /4?](https://stackoverflow.com/q/24295464) – Peter Cordes Aug 21 '21 at 02:27
3

The following applies to 64-bit mode.

JMP can be done either directly or indirectly.

Direct jumps are relative to the instruction pointer RIP. There are two types of direct jumps: short and near.

  • Short jumps use Opcode EB followed by a 8-bit signed displacement and are therefore RIP –128 to +127 bytes.
  • Near jumps use Opcode E9 and are followed by a 32-bit signed displacement and are therefore RIP -2147483648 to +2147483647.

Your assembler will use short jumps when it can since they only need two bytes. But in NASM you can force a near jump using the near keyword e.g.

test:
    jmp test         ; eb fb 
    jmp near test    ; e9 f6 ff ff ff

64-bit addressing modes are: RIP-relative, 32-bit absolute, 64-bit absolute, and relative to a base pointer. The JMP instruction can use all of these except 64-bit absolute. Indirect jumps use Opcode FF. Some examples using the NASM syntax:

jmp [a]                ;ff 24 25 00 00 00 00 - 32-bit absolute 
jmp [rel a]            ;ff 25 e7 ff ff ff    - RIP + 32-bit displacement
jmp [rdi]              ;ff 27                - base pointer
jmp [rdi +4*rsi + a]   ;ff a4 b7 00 00 00 00 - base pointer +4*index + displacement

On OSX, however, 32-bit absolute addressing is not possible because the image base is greater than 2^32.

The only instruction that can use 64-bit absolute addressing is mov and then either the source or destination must be AL, AX, EAX or RAX. E.g in NASM

mov rax, [qword a]
Z boson
  • 32,619
  • 11
  • 123
  • 226