gcc optimisation with LEA

Question

I'm fiddling with the gcc's optimisation options and found that these lines:

int bla(int moo) {
  return moo * 384;
}

are translated to these:

0:   8d 04 7f                lea    (%rdi,%rdi,2),%eax
3:   c1 e0 07                shl    $0x7,%eax
6:   c3                      retq

I understand shifting represents a multiplication by 2^7. And the first line must be a multiplication by 3.

So i am utterly perplexed by the "lea" line. Isn't lea supposed to load an address?

for completeness: the syntax for address operand fetching is: ±d(A,B,C) which will be translated into A±d + B * C — Banyoghurt, May 02 '13 at 14:04
By the way, the only modern CPU that uses the AGU for `lea` is Intel Atom. On all other modern CPU's it goes to an ALU. It's still useful however, because it combined several operations, has an arbitrary output register, and for not changing the flags. Also, this form (64bit address, 32bit result) is the shortest encoding of `lea` in 64bit mode. — harold, May 02 '13 at 18:19

score 6 · Accepted Answer · edited May 02 '13 at 14:23

6

lea (%ebx, %esi, 2), %edi does nothing more than computing ebx + esi*2 and storing the result in edi.

Even if lea is designed to compute and store an effective address, it can and it is often used as an optimization trick to perform calculation on something that is not a memory address.

lea    (%rdi,%rdi,2),%eax
shl    $0x7,%eax

is equivalent to :

eax = rdi + rdi*2;
eax = eax * 128;

And since moo is in rdi, it stores moo*384 in eax

edited May 02 '13 at 14:23

Stephen Canon

103,815
19
183
269

answered May 02 '13 at 13:50

zakinster

10,508
1
41
52

1

thank you very much! i knew you could do cheaty stuff with lea without changing flags and such, but this... – Banyoghurt May 02 '13 at 13:59

score 4 · Answer 2 · answered May 02 '13 at 14:00

It is a standard optimization trick on x86 cores. The AGU, Address Generation Unit, the subsection of the processor that generates addresses, is capable of simple arithmetic. It is not a full blown ALU but has enough transistors to calculate indexed and scaled addresses. Adds and shifts. The LEA, Load Effective Address instruction is a way to invoke the logic in the AGU and get it to calculate simple expressions.

The optimization opportunity here is that the AGU operates independently from the ALU. So you can get superscalar execution, two instructions executing at the same time.

That doesn't actually happen visibly in your code snippet, but it could happen if there's a calculation being done before the shown instructions that required the ALU. It was a trick that only really payed off on simpler cpu cores, 486 and Pentium vintage. Modern processors have multiple ALUs so don't really require this trick anymore.

Also note that shifts are typically faster than `IMUL`; and replacing a "multiply by constant" with shifts is also a common optimisation for many CPUs. — Brendan, May 02 '13 at 16:35
What is the precise GCC optimization flag that enables it (e.g. `-fuse-lea`, implied by `-O3`). — Ciro Santilli OurBigBook.com, May 31 '15 at 20:02

gcc optimisation with LEA

2 Answers2