Difference in ARM and x86 assembly code generated by GCC

Question

Let us take a simple C code for setting a register:

int main()
{
    int *a = (int*)111111;
    *a = 0x1000;
    return 0;
}

When I compile this code for ARM (arm-none-eabi-gcc) with level 1 optimization, the assembly code is something like:

mov     r2, #4096
mov     r3, #110592
str     r2, [r3, #519]
mov     r0, #0
bx      lr

Looks like the address 111111 was resolved to the closest 4K boundary (110592) and moved to r3, and then the value 4096(0x1000) was stored by adding 519 to 110592 (=111111). Why does this happen?

In x86, the assembly is straightforward:

movl    $4096, 111111
movl    $0, %eax
ret

I want to caution you not to evaluate these architectures based on the number of assembly instructions generated. The number of distinct assembly instructions doesn't tell you how many _bytes_ of code are used, nor does it tell you anything about execution time. The x86 ISA was designed in a time when humans still did a lot of assembly coding so it was important to have "straightforward" instructions even if they were inefficient in other ways. — , Feb 23 '14 at 14:56
x86 uses variable word length instructions so you can have a single instruction with any 32 or 64 bit immediate you want. Arm is fixed instruction length so you cannot have any 32 or 64 bit immediate you want, it takes multiple instructions and/or it takes a pc relative load of a location that has the value. The compiler will pick one option or the other. — old_timer, Feb 23 '14 at 23:01

Aki Suihkonen · Answer 1 · 2014-02-23T14:50:54.100

The reason behind this encoding, is because x86 has variable sized instructions -- from 1 byte up to 16 bytes (and possibly even more with prefixes).

ARM instruction is 32 bits wide (not counting Thumb modes), which means that it's simply not possible to encode all 32-bit wide constants (immediates) in a single opcode.

Fixed sized architectures typically use a few methods to load large constants:

1)  movi  #r1, Imm8  ; // Here Imm8 or ImmX is simply X least significant bits
2)  movhi #r1, Imm16 ; // Here Imm16 loads the 16 MSB of the register
3)  load  #r1, (PC + ImmX);  // use PC-relative address to put constant in code
4)  movn  #r1, Imm8 ;  // load the inverse of Imm8 (for signed constants) 
5)  mov(i/n) #1, Imm8 << N;       // where N=0,8,16,24

Variable sized architectures OTOH can put all the constants in a single instruction:

xx xx xx 00 10 00 00 11 11 11 00 ; // assuming that it takes 3 bytes to encode
                                 ; // the instruction and the addressing mode
; added with 4 bytes to encode the 4096 and 4 bytes to encode 0x00111111

`mov [dword], dword` is `C7 05 dword dword`, by the way (in 32bit mode) — harold, Feb 23 '14 at 17:49

score 3 · Answer 2 · answered Feb 25 '14 at 15:34

The address had to be split in two parts because this specific constant cannot be loaded into a register with a single instruction.

The ARM documentation specifies limitations for the immediate constants allowed in some instructions (such as MOV):

In ARM instructions, constant can have any value that can be produced by rotating an 8-bit value right by any even number of bits within a 32-bit word.

In 32-bit Thumb-2 instructions, constant can be:

Any constant that can be produced by shifting an 8-bit value left by any number of bits within a 32-bit word.

Any constant of the form 0x00XY00XY.
Any constant of the form 0xXY00XY00.
Any constant of the form 0xXYXYXYXY.

The value 111111 (1B207 in hex) can't be represented as any of the above, so the compiler had to split it.

110592 is 1B000 so it fulfills the first condition (an 8-bit value 0x1B rotated left by 12 bits) and can be loaded using MOV instruction.

The STR instruction, on the other hand, has a different set of limitations for the offsets used. In particular, 519 (0x207) falls into the -4095 to 4095 range allowed for the word store/load in ARM mode.

In this specific case the compiler managed to split the constant in only two parts. If your immediate has more bits, it may have to generate even more instructions, or use a literal pool load. For example, if I use 0xABCDEF78, I get this (for ARMv7):

movw    r3, #61439
movt    r3, 43981
mov     r2, #4096
str     r2, [r3, #-135]
mov     r0, #0
bx      lr

For architectures without MOVW/MOVT (e.g. ARMv4), GCC seems to fall back to literal pool:

    mov     r2, #4096
    ldr     r3, .L2
    str     r2, [r3, #-135]
    mov     r0, #0
    bx      lr
.L3:
    .align  2
.L2:
    .word   -1412567041

score 1 · Answer 3 · answered Feb 23 '14 at 13:37

1

The compiler is probably taking advantage of ARM immediate value encoding to reduce code size. Basically 110592 is 0x1B << 12 and this enables some simplifications. Take a look at the output from arm-none-eabi-objdump -d of your program to check the length of each instruction.

answered Feb 23 '14 at 13:37

Balau

495
3
9

This isn't reducing code size: The alternative would be to load the address from memory. The compiler is choosing here to use immediate constants to form the address in order to save a load - which could potentially miss cache and will likely take longer than the single cycle used to load an immediate constant. – marko Feb 23 '14 at 18:46
That immediate constant uses 12 bits, not 32. It enables the encoding of the constant inside a full 32 bit instruction. So a single 32bit instruction substitutes 32bits of memory containing the constant plus 16 bits (or even 32 bits in ARM mode) of instruction. – Balau Feb 23 '14 at 21:18

Difference in ARM and x86 assembly code generated by GCC

3 Answers3

Related