Stack address size on 80486, using gcc

Question

GCC manual says that -m32 "generates code that runs on any i386 system". Assume I want to write a function that swaps the bytes of a 32 bit number:

    .text
    .globl  SwapBytes
    .type   SwapBytes, @function
SwapBytes:
    pushl %ebp
    movl %esp, %ebp
    movl 8(%ebp), %eax
    bswapl %eax
    popl %ebp
    ret

However, the BSWAP instruction is not supported on 80386, only since 80486. So I assemble gcc SwapBytes.s -march=i486 -c -o SwapBytes.elf, as suggested by the cited manual. But then I get an error:

Assembler messages:
Error: invalid instruction suffix for `push'
Error: invalid instruction suffix for `pop'

But why? I thought, 32 bit stack was introduced with 80386. Ok, let's try with -march=pentium3 in case I missed something about 32 bit stack, but the same result. However, if I assemble just with the -m32 option, gcc does not complain.

I thought maybe gcc ignores the -march=... options when assembling (and assumes -m64?), but does not ignore the -m32 option? But what options can I use to specify the CPU I want to assemble for?

`-m32` just switches to 32 bit operation mode (commonly called i386 mode). It does not switch to “only support 80386 instructions). — fuz, Jun 26 '22 at 14:24
The 386 obviously does support 32-bit stack operations. The "invalid suffix for push" message indicates that you are assembling for 64-bit mode, where only `pushq` (or obscurely `pushw`) would be legal. — Nate Eldredge, Jun 26 '22 at 14:25

score 3 · Accepted Answer · answered Jun 26 '22 at 14:36

3

It appears that gcc passes the -m32 and -m64 options to the assembler (translating them to --32 and --64), but it does not pass along the -march= options. I think this is because the assembler recognizes a different set of architectures than the compiler does. For instance, -march=skylake is accepted by gcc but would be an error to the assembler.

But you can pass it explicitly using the -Wa option. So for instance gcc -m32 -Wa,-march=i386 bswap.s will give

Error: `bswap' is not supported on `i386'

and gcc -m32 -Wa,-march=i486 bswap.s will run successfully.

Note that the default architecture depends on how you configured your binutils installation. But it is most likely something much more modern than i386, which is why the assembler doesn't complain when you use bswap without an -march option.

answered Jun 26 '22 at 14:36

Nate Eldredge

48,811
6
54
82

1

The other key point here is that `-march` and `-m32` / `-m64` are orthogonal. `-march=i486` doesn't imply `-m32`. Thus `-m32` is necessary, (However, for old enough arch args, like `gcc -c -Wa,-march=i386 foo.s`, you get `Fatal error: 64bit mode not supported on 'i386'.`) – Peter Cordes Jun 26 '22 at 19:36
1

For GAS, if you don't specify an arch (via command line or `.arch generic32.sse4.1` [directive](https://sourceware.org/binutils/docs/as/i386_002dArch.html#i386_002dArch)), it allows any instruction it knows about. Unlike GCC where the usual default for 64-bit mode is `-march=x86-64` (only baseline x86-64). For `-m32` mode, GCC usually gets configured to target i686 (cmov and so on), and sometimes also SSE1/2. Again, this is only relevant when compiling C, not what it passes to GAS. – Peter Cordes Jun 26 '22 at 19:40

Peter Cordes · Answer 2 · 2022-06-26T20:47:27.700

-march and -m32 / -m64 gcc options are orthogonal. 64-bit mode doesn't support pushl.

gcc -march=i486 doesn't imply -m32. Thus gcc -m32 is necessary, to invoke as --32.

Also, GCC doesn't pass on its -march= option to GAS, using it only for C->asm compilation.

By default, GAS accepts any instructions it knows about. So gcc -m32 -c bswap.s works, and would also accept AVX512VBMI instructions like vpmultishiftqb (%ecx){1to8}, %zmm0, %zmm1 (broadcast-load and bitfield-extract) without further options.

This is basically opposite of how GCC works when compiling C to asm, where it has a default target baseline (e.g. for 32-bit mode, often i686 or i686 + SSE2, allowing instructions like CMOV).

This makes some sense because in asm, instruction choice is governed by the source. If you don't want to use new instructions for compat with old CPUs, that's up to you. But for GCC, where a machine is generating asm, you might want portable binaries that can run on any CPU, or any CPU newer than some baseline. Or a binary that will use everything your CPU has (-march=native), avoiding instructions your CPU doesn't support.

If you use new instructions via inline asm, you can still compile with gcc without a -march option. (But normally it's better to use intrinsics to have GCC emit those instructions itself, so it knows what's going on.)

If you want to tell GAS to impose limits, e.g. to catch mistakes like accidentally using cmov or cmpxchg8b when you intended your code to be able to run on a 486, its as -march=i486 option or .arch i486 directive in the source supports that.

(See the GAS manual; the microarchitecture names are similar to what gcc -march= accepts, except for recent Intel where GCC accepts skylake, but GAS would need corei7.avx2.fma.movbe.bmi2 or something, and that's still incomplete.)

To get GCC to run as --32 -march=i486, you use
gcc -c -m32 -Wa,-march=i486 foo.s

If you omit the -m32, you get Assembler messages:
Fatal error: 64bit mode not supported on 'i486'.

Fun fact: GAS has lots of other x86 options that GCC doesn't set. I'm showing the gcc -Wa,gas-option form; if you were running as --32 directly, you'd use just the as --32 -Os or whatever.

gcc -Wa,-Os - optimize your asm for size, e.g. shortening mov $1, %rax to mov $1, %eax because that's architecturally equivalent, or test $1, %eax (5 bytes) to test $1, %al (2 bytes).
gcc -Wa,-mbranches-within-32B-boundaries - How can I mitigate the impact of the Intel jcc erratum on gcc?
gcc -Wa,-msse2avx - encode SSE instructions with VEX prefix.
gcc -Wa,-muse-unaligned-vector-move - translate movaps to movups and so on. (But it can't transparently turn paddb (%ecx), %xmm0 into something that doesn't require alignment, so it's probably only useful with AVX, if you want to relax the alignment requirements for a function. In AVX, only vmovaps/vmovdqa load/store do alignment enforcement, memory source operands for ALU instructions are like vmovups)

I've never really wanted to use any of these options (except the workaround for Skylake's JCC-erratum performance pothole), but it's neat that they exist.

Stack address size on 80486, using gcc

2 Answers2

Linked