Why would movw di, [ebp+4] be illegal?

Question

I'm trying to compile a 6502-based emulator for an Intel Atom system, but I get these sorts of errors for this file: https://github.com/littlefluffytoys/Beebdroid/blob/master/app/src/main/jni/6502asm_x86.S

jni/6502asm_x86.S:163:5: error: invalid instruction mnemonic 'movb'
movb ch, [ ebp+9] # ch = r10 = S
^~~~
jni/6502asm_x86.S:181:2: error: invalid instruction mnemonic 'pushw'
pushw 0xfffa
^~~~~

Is this a 32/64 bit issue? I'm familiar with assembly, but not x86 or x86_64, and I'm finding it hard to track down what's going on. I understand that movq wouldn't be available on 32 bit, but I can't think why byte wouldn't be available at all.

I had to remove all the % signs from the file - seems my version of cc (4.8.4) didn't like them - but then ran into this mov issue.

What's particularly confusing is that earlier instances of movw and movb aren't producing errors, like

    movw  di,  [ ebp+4]     #  di = r6  = PC
    movb  cl,  [ ebp+6]     #  cl = r7  = A

(Although I notice these are in macro definitions, so perhaps they're not parsed yet)

I read in some Intel documentation that mov sometimes looks as follows, but I don't know enough about this format to try rewriting the dozens of errors:

MOV     ECX, dword ptr table[RBX][RDI]

Any help would be appreciated!

Assembler macros are purely text substitutions. If you don't use a macro, its contents don't have to be valid. Anyway, this looks like trying to use AT&T operand-size suffixes in `.intel_syntax noprefix` mode. Maybe it worked with some assemblers? Are you building this on a Mac (using clang's built-in assembler?) Or maybe it was only tested that way on a Mac, and GNU binutils `as` (as invoked by GCC) is rejecting it. — Peter Cordes, Sep 29 '19 at 01:34

Peter Cordes · Accepted Answer · 2019-09-30T00:28:31.360

Assembler macros are purely text substitutions. If you don't use a macro, its contents don't have to be valid. And if it is used, it's only assembled at the place where it's used. (It's not like an inline function, it's like a C preprocessor macro).

The original file uses .intel_syntax noprefix at the top, but then is full of insane code like
mov %ebx, [%ebx + %eax*4] and movb %al,[%esi+%edi] that still decorates register names with % despite noprefix, and more importantly still uses AT&T-style operand-size suffixes.

It's a mutant hybrid of Intel and AT&T syntax, no wonder some assemblers reject it.

See https://stackoverflow.com/tags/intel-syntax/info vs. https://stackoverflow.com/tags/att/info

On my Linux desktop, the original file assembles just fine with GNU Binutils as, which I invoke gcc -m32 -c 6502asm_x86.S. (I'm on Linux, so this is real GCC, specifically gcc --version says gcc (GCC) 9.1.0 Copyright (C) 2019 Free Software Foundation, Inc. etc. It uses as. as --version says "GNU assembler (GNU Binutils) 2.32")

I suspect you're on a Mac with Apple Clang. Your "cc (4.8.4)" looks more like a gcc version number, but GCC doesn't contain an assembler. It always uses an external one. And on a Mac, that may still be Clang/LLVM, not GNU Binutils.

On my Linux desktop, clang 8.0.1 rejects this file. It's much stricter about not accepting AT&T-isms in Intel mode, and doesn't support .intel_syntax prefix at all, only intel noprefix or att prefix. After removing all the % characters in the file, clang -m32 -c 6502asm_x86.S gives the same error messages you showed:

6502asm_x86.S:121:5: error: invalid instruction mnemonic 'movw'
    movw di, [ebp+4] # di = r6 = PC
    ^~~~

Fixing this mess:

If possible, use as aka gas from GNU binutils. But IDK if it supports MachO object files so that might not be an option for you. (Update: apparently you're on Linux trying to use an Android toolchain. That's also clang, but probably is creating ELF objects. So you could probably just use as manually.)

To actually fix the source, remove all the operand-size suffixes, too, and let the register operand(s) imply the size.

That file does correctly use GAS .intel_syntax operand-size overrides in cases like mov dword ptr [ebp+20], 0 when neither operand is a register so it needs the dword ptr.

But you can't just remove the last character of every mnemonic: some instructions already omit it. (It looks like that file does so for dword operand-size, but redundantly specifies it for every instruction using byte or word operand-size.)

There are a few instructions that can still use (and sometimes need) a size suffix in Intel syntax, for example pushw immediate. Some assemblers like NASM use push word 123, but GAS .intel_syntax noprefix uses pushw 123. If there's a register or memory operand, though, that can imply the size. e.g. push di is a word push, pop word ptr [ecx] is a word pop. You also have suffixes on "string" instructions like movsb/w/d / lodsb/w/d, and so on.

e.g.

do_interrupt:
        PUSHWORD di                     # push(cpu->pc)
        movzx eax, byte ptr [ebp+10]
        or  eax, 0x20           # uint8_t temp = cpu->p | 0x20;
        PUSH_BYTE al             # push(temp);
        popw ax
        movw di, [esi+eax]              # cpu->pc=*(uint16_t*)&(cpu->mem[0xfffe]);
        or byte ptr [ebp+10], 4         # cpu->p |= FLAG_I;
        movw [ebp+4],di            # Remove when C-only
        movb [ebp+9],ch             # Remove when C-only
        pop eax
        add eax,7                           # c += 7;
        push eax

becomes

do_interrupt:
        PUSHWORD di                     # push(cpu->pc)
        movzx eax, byte ptr [ebp+10]
        or    eax, 0x20           # uint8_t temp = cpu->p | 0x20;
        PUSH_BYTE al             # push(temp);
        pop   ax
        mov   di, [esi+eax]              # cpu->pc=*(uint16_t*)&(cpu->mem[0xfffe]);
        or    byte ptr [ebp+10], 4         # cpu->p |= FLAG_I;
        mov   [ebp+4],di            # Remove when C-only
        mov   [ebp+9],ch             # Remove when C-only

        # pop eax; add eax,7 ; push eax   # optimize into one instruction:
        add   dword ptr [esp], 7     # c += 7;
        # or address it relative to EBP if we know where ESP is relative to EBP

Obviously you'll need to look at the macro defs too.

This doesn't look like the most efficient code ever; could do more in registers. But that's beside the point. I only saw one small peephole optimization of pop/add/push into a memory-destination add, didn't try to optimize the rest.

There's other obvious stuff like

    movb %dl,  [%ebp+7]     #  dl = r8  = X
    movb %dh,  [%ebp+8]     #  dh = r9  = Y

which could be a single word load into DX = DH:DL (x86 is little-endian and has very efficient unaligned loads, if this happens to be unaligned).

So I wouldn't recommend using this code as an example to learn x86!

Thanks, that looks really helpful! It's probably the original author on a Mac: I'm looking to make this compile on my out-of-date Ubuntu laptop, to get the : key working from my Bluetooth keyboard into the emulated host running on my Android tablet, not making it more efficient or learning x86. :) I'd use the original .so file from the APK, but it has a text relocation that Android prohibits in apps built after version 21 for security. — android.weasel, Sep 29 '19 at 22:37
Maybe some of the inefficiency - like the movb pair you cite above - could be because the x86 was a direct translation from ARM? — android.weasel, Sep 29 '19 at 22:50
@android.weasel: It works fine for me on Arch Linux, but I haven't tried with an older GAS. I assumed you were on a Mac because Apple uses clang instead of GCC and GNU Binutils. It will *not* build with clang. — Peter Cordes, Sep 29 '19 at 22:51
It's building with AndroidNDK, which... `/home/tim/lib/android-sdk-linux/ndk/20.0.5594570/toolchains/llvm/prebuilt/linux-x86_64/bin/clang` is `Android (5220042 based on r346389c) clang version 8.0.7 (https://android.googlesource.com/toolchain/clang b55f2d4ebfd35bf643d27dbca1bb228957008617) (https://android.googlesource.com/toolchain/llvm 3c393fe7a7e13b0fba4ac75a01aa683d7a5b11cd) (based on LLVM 8.0.7svn)`. What precludes clang so completely? I've tried removing the 'w' and 'b' suffixes, but there's a couple of 'pushw 0' in WRITEBYTED that look like they'd need width qualifiers. — android.weasel, Sep 30 '19 at 00:03
@android.weasel: Clang's built-in assembler in Intel-syntax mode does not accept `%eax` as a register name, or `movw` as a synonym for `mov`. (See [How to set gcc to use intel syntax permanently?](//stackoverflow.com/a/58154963) for an example (using inline asm in C, but same problem).) GAS does. Oh, and yes `pushw` is an Intel mnemonic; a few instructions *do* use suffixes in Intel syntax. I'll update my answer for that. — Peter Cordes, Sep 30 '19 at 00:18
This is really helpful! It's assembling now. One (hopefully) last thing: the main reason I'm trying to assemble this is because the .so from the original package needs text relocation, now prohibited. I presume this relates to the error I'm currently seeing from the NDK: `shared library text segment is not shareable` - I've tried adding fPIC to the CFLAGS but the message persists. The last fns_asm part of the .S file looks like a lookup table to dispatch opcodes; I presume this is the relocation problem. I don't suppose I could just cook that into a C array of method pointers? — android.weasel, Sep 30 '19 at 00:54
@android.weasel: `-fPIC` is a compile-time code-gen option. It affects translation from C to asm. It can't affect asm source; the human that writes the asm has to make it position-independent manually. — Peter Cordes, Sep 30 '19 at 00:56
@android.weasel: to actually fix it, instead of a table of absolute addresses, consider using a table of relative offsets that you add to a known address. e.g. have a look at what GCC does when create jump tables for `switch` with `-fPIE` https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 even on x86-64 ELF/Linux where text relocations *are* allowed. Note that PIC/PIE in 32-bit code is generally inefficient because you don't have RIP-relative addressing modes. Every `mov reg, [symbol]` uses an absolute address. (related: [this](https://stackoverflow.com/questions/43367427)) — Peter Cordes, Sep 30 '19 at 01:01

Why would movw di, [ebp+4] be illegal?

1 Answers1

Fixing this mess: