6

x86 assembly design has instruction suffix, such as l(long), w(word), b(byte).
So I thought that jmpl to be long jmp

But it worked quite weird when I assemble it:

Test1 jmp: assembly source, and disassembly

main:
  jmp main

eb fe     jmp 0x0804839b <main> 

Test2 jmpl: assembly source, and disassembly

main:
  jmpl main       # added l suffix

ff 25 9b 83 04 08   jmp *0x0804839b

Compared to Test1, Test2 result is unexpected.
I think it should be assembled the same as Test1.


Question:
Is jmpl some different instruction in 8086 design?
(according to here, jmpl in SPARC means jmp link. is it something like this?)

...Or is this just a bug in GNU assembler?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Jiwon
  • 1,074
  • 1
  • 11
  • 27
  • 4
    In 32-bit code using AT&T `jmp main` is a relative jump to the label main. `jmpl main` is an indirect near jump to the address stored at the label `main`. – Michael Petch Jan 27 '19 at 09:52
  • 1
    You should have gotten a warning in your second test about a missing asterisk when assembling. – fuz Jan 27 '19 at 12:33
  • Related: [What is callq instruction?](//stackoverflow.com/q/46752964), and [What is the difference between retq and ret?](//stackoverflow.com/q/42653095). In those cases, it's just the operand-size suffix. – Peter Cordes Jan 27 '19 at 21:01

2 Answers2

6

An l operand-size suffix implies an indirect jmp, unlike with calll main which is still a relative near-call. This inconsistency is pure insanity in AT&T syntax design.

(And since you're using it with an operand like main, it becomes a memory-indirect jump, doing a data load from main and using that as the new EIP value.)

You never need to use the jmpl mnemonic, you can and should indicate indirect jumps using * on the operand. Like jmp *%eax to set EIP = EAX, or jmp *4(%edi, %ecx, 4) to index a jump table, or jmp *func_pointer. Using jmpl is optional in all of these.

You could use jmpw *%ax to truncate EIP to a 16-bit value. That assembles to 66 ff e0 jmpw *%ax)


Compare What is callq instruction? and What is the difference between retq and ret?, that's just the operand-size suffix behaving like you expected it would, same as plain call or plain ret. But jmp is different.


semi-related: far jmp or call (to a new CS:[ER]IP) in AT&T syntax is ljmp / lcall. These are very different.


It's also insane that GAS accepts jmpl main as equivalent to jmpl *main. It only warns instead of erroring.

$ gcc -no-pie -fno-pie -m32 jmp.s 
jmp.s: Assembler messages:
jmp.s:3: Warning: indirect jmp without `*'

And then disassembling it to see what we got, with objdump -drwC a.out:

08049156 <main>:                                          # corresponding source line (added by hand)
 8049156:       ff 25 56 91 04 08       jmp    *0x8049156    # jmpl main
 804915c:       ff 25 56 91 04 08       jmp    *0x8049156    # jmp  *main
 8049162:       ff 25 56 91 04 08       jmp    *0x8049156    # jmpl *main

08049168 <foo>:
 8049168:       e8 fb ff ff ff          call   8049168 <foo> # calll foo
 804916d:       ff 15 68 91 04 08       call   *0x8049168    # calll *foo
 8049173:       ff 15 68 91 04 08       call   *0x8049168    # call  *foo

We get the same thing if we replace l with q in the source, and built without -m32 (using the default -m64). Including the same warning about a missing *. But the disassembly has an explicit jmpq and callq on every instruction. (Except for a relative direct jmp I added, which uses the jmp mnemonic in the disassembly.)

It's like objdump thinks 32-bit is the default operand-size for jmp/call in both 32 and 64-bit mode, so it wants to always use a q suffix in 64-bit, but leaves it implicit in 32-bit mode. Anyway, that's just disassembly choice between implicit / explicit size suffixes, no weirdness for a programmer writing source code.


Other AT&T-syntax assemblers:

  • Clang's built-in assembler does reject jmpl main, requiring jmpl *main.

    $ clang -m32 jmp.s
    jmp.s:3:8: error: invalid operand for instruction
      jmpl main
           ^~~~
    

    calll main is the same as call main. call *main and calll *main are both accepted for indirect jumps.

  • YASM's GAS-syntax mode assembles jmpl main to a near relative jmp, like jmp main! So it disagrees with gcc/clang about jmpl implying indirect. (Very few people use YASM in GAS mode; and these days its maintenance hasn't kept up with NASM for new instructions like AVX512. I like YASM's good defaults for long NOPs, but otherwise I'd recommend NASM.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
5

You have fallen victim to the awfulness that is AT&T syntax.

x86 assembly design has instruction suffix, such as l(long), w(word), b(byte).

No, it doesn't. The abomination that is AT&T syntax has this.
In the sane Intel syntax there are no such suffixes.

Is jmpl something different.

Yes, this is an indirect jump to an absolute address. A -near- jump to a -long- address.
(ljmp in gnu syntax is a -far- jump, but that's totally different, setting a new CS:EIP.)
The default for a jump is a near jump, to a relative address.
Note that the Intel syntax for this jump is:

jmp dword [ds:0x0804839b]  //note the [] specifying the indirectness.
//or, this is the same
jmp [0x0804839b]
//or
jmp [main]
//or
jmp DWORD PTR ds:0x804839f  //the PTR makes it indirect.

I prefer the [], to highlight the indirectness.

It does not jump to 0x0804839b, but reads a dword from the specified address and then jumps to the address specified in this dword. In the Intel syntax the indirectness is explicit.

Of course you intended to jump to 0x0804839b (aka main:) directly, which is done by:

Hm, most assembler do not allow absolute far jumps!  
It cannot be done.

See also: How to code a far absolute JMP/CALL instruction in MASM?

A near/short relative jump is (almost) always better, because it will still be valid when your code changes; the long jump can become invalid. Also shorter instructions are usually better, because they occupy less space in the instruction cache. The assembler (in Intel mode) will automatically select the correct jmp encoding for you.

SPARC
This is a totally different processor than the x86. From a different manufacturer, using a different paradigm. Obviously the SPARC documentation bears no relation to the x86 docs.

The official Intel documentation for jmp is here.

https://www.felixcloutier.com/x86/jmp

Note that Intel does not specify different mnemonics for the relative and absolute forms of the jmp. This is because Intel want to assembler to always use the short (relative) jump, unless the target is too far away, in which case the near jmp rel32 encoding is used. (Or in 16-bit mode, jmp foo could assemble to a far absolute jump to a different CS value, aka segment. In 32-bit mode, a relative jmp rel32 can reach any other EIP value from anywhere.)
The beauty of this is that the assembler automatically uses the proper jump for you.
(In 64-bit mode jumping more than +-2GiB requires extra instructions or a pointer in memory, there is no 64-bit absolute direct far jump, so the assembler can't do this for you automatically.))

Forcing gnu back to sanity
You can use

 .intel_syntax noprefix    <<-- as the first line in your assembly
 mov eax,[eax+100+ebx*2] 
 ....

To make gnu use Intel syntax, this will put things back the way they are designed by Intel and away from the PDP7 syntax used by gnu.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Johan
  • 74,508
  • 24
  • 191
  • 319
  • 1
    I think it should be worth noting that `jmp *0x0804839b` doesn't jump to memory address 0x0804839b. It retrieves the 32-bit address at memory location ds:0x0804839b, uses that as the jump target. In Intel syntax it is `jmp DWORD ds:0x0804839b`.This is in fact an indirect, absolute **near** jump. The CS used for the JMP is the current value of CS. This is not a FAR JMP. – Michael Petch Jan 27 '19 at 09:54
  • `jmp *main` and `jmpl *main` would also be the same thing since in 32-bit protected mode the operand is 32-bits wide by default. it is in fact legal to do`jmpw *main` as well. In 32-bit protected mode that would read the 16-bit value at `main` and use it as a jump target. the `w` and the `l` suffix denote the size of the offset to be read from the memory location specified when doing indirect jumps. – Michael Petch Jan 27 '19 at 10:21
  • 1
    @PeterCordes : I think there is a reason why GAS gives the warning _Warning: indirect jmp without *'_ when you use JMPL. Even worse is that if you use YASM to assemble that same code (parsed as gas) - there is no warning and it actually encodes `jmpl` as a relative JMP. I consider YASM to have the bug since it isn't conforming to GAS's interpretation. – Michael Petch Jan 27 '19 at 21:22
  • 2
    All this hate for GAS and AT&T syntax seems uncalled for. It actually has many great features (its pseudo-ops are some of the best, such as nop-aligning). It also requires you to be explicit about operand sizes, and doesn't have any of this confusing `DWORD PTR FLAT` stuff. – S.S. Anne Aug 25 '19 at 05:06
  • @S.S.Anne: AT&T syntax has some nice features, but it has some *major* warts, and this inconsistency between `jmpl` and `calll` is one of them. Others include the x87 syntax design bug for some forms of things like `fsub` vs. `fsubr` [described in the GAS manual](https://sourceware.org/binutils/docs/as/i386_002dBugs.html), and that `add $1, (%rdi)` defaults to 32-bit operand-size instead of erroring about ambiguity; GAS only errors for `mov`, and only recently started warning for other instructions. (Although clang's assembler does error on any ambiguity.) – Peter Cordes Oct 06 '21 at 18:39
  • @S.S.Anne: I agree GAS has nice `.p2align` directives, but the actual AT&T *instruction* syntax has some big downsides for historical reasons. NASM syntax is a nice clean design, although NASM needs a macro package to have non-terrible NOPs, and even then it's not conditional like `.p2align 4,,10`. But those are kind of assembler directive features, separate from the syntax design. Unfortunately there's no best-of-both-worlds. – Peter Cordes Oct 06 '21 at 18:43
  • @Johan: In 32-bit code, `jmp foo` will never assemble to an absolute jump. You had a paragraph at the end suggesting that it would pick near vs. far, rather than short vs. near. i.e. mixing up `jmp rel32` for non-short with `ljmp cs:eip` far absolute. In 32-bit mode, rel32 can reach any other EIP. Maybe you meant a 16-bit segmented model? If not, you might want to rewrite or remove that paragraph. I edited it some to be consistent with the rest of the answer, but it seems like a big tangent to bring up far jumps at all when they're not really relevant here. – Peter Cordes Oct 06 '21 at 19:01
  • On not hating unix as syntax, one of its most important features is that it is compatible with *cpp*, which masm et al are not. In systems programming, this is a huge win for clarity and consistency. Also, for what it is worth, masm syntax was *not* intels. Intel still had the operands backwards, but the original syntax is preserved in the assembly notation from QNXv1,2; desmet and aztec tools. Intel adopted masm when the PC came out. – mevets Oct 06 '21 at 19:34
  • @mevets: Intel's 8086 manual from Oct 1979 has examples like `add al, 5` as register <- immediate. e.g. Figure 2-62 ASM-86 Addressing Mode Examples. http://matthieu.benoit.free.fr/cross/data_sheets/Intel_8086_users_manual.htm / http://matthieu.benoit.free.fr/cross/data_sheets/1979%20Intel%20The%208086%20Family%20Users%20Manual%20197910%20%5B760%5D.pdf . (And yes, that is an official Intel manual.) IBM-PC didn't release until a couple years later, Aug 1981. It's possible Intel had some other software assembler with different conventions while the chip was in development? – Peter Cordes May 31 '22 at 05:17
  • Interesting. Gord Bell ( not the DEC, MSoft one ) wrote the assembler for QNX in the late seventies. I asked him about the syntax, and he said he took it from the intel 8086 manual. He said that Intel changed syntax ( I presumed inspired by MS ) when they later released the 8088. QNX left its since its C and Waterloo Fortran, Cobol, ... compilers all produced it. The QNX one had the odd `dst, src` ordering, but none of the baroque annotations. I'll try to remember to ask him next time I see him. – mevets May 31 '22 at 12:54