Why can assembly instructions contain multiplications in the "lea" instruction?

Question

I am working on a very low level part of the application in which performance is critical.

While investigating the generated assembly, I noticed the following instruction:

lea eax,[edx*8+8]

I am used to seeing additions when using memory references (e.g. [edx+4]), but this is the first time I see a multiplication.

Does this mean that the x86 processor can perform simple multiplications in the lea instruction?
Does this multiplication have an impact on the number of cycles needed to execute the instruction?
Is the multiplication limited to powers of 2 (I would assume this is the case)?

Thanks in advance.

See also [What's the purpose of the LEA instruction?](https://stackoverflow.com/questions/1658294/whats-the-purpose-of-the-lea-instruction) for more general stuff about using it for things other than strictly address calculations. — Peter Cordes, Aug 19 '17 at 16:18

Mysticial · Accepted Answer · 2012-05-09T08:30:05.127

13

To expand on my comment and to answer the rest of the question...

Yes, it's limited to powers of two. (2, 4, and 8 specifically) So no multiplier is needed since it's just a shift. The point of it is to quickly generate an address from an index variable and a pointer - where the datatype is a simple 2, 4, or 8 byte word. (Though it's often abused for other uses as well.)

As for the number of cycles that are needed: According to Agner Fog's tables it looks like the lea instruction is constant on some machines and variable on others.

On Sandy Bridge there's a 2-cycle penalty if it's "complex or rip relative". But it doesn't say what "complex" means... So we can only guess unless you do a benchmark.

edited May 09 '12 at 08:30

answered May 09 '12 at 08:17

Mysticial

464,885
45
335
332

Thanks. Also liked the reference to Agner Fog (although the link in your answer doesnt' seem to be correct). – Patrick May 09 '12 at 08:29
Oops, bad copy+paste. Fixed now. – Mysticial May 09 '12 at 08:30
@Mysticial - AFAIK, this is not `lea` specific. Check out my answer :) – ArjunShankar May 09 '12 at 08:39
4

Note that while technically the multiplication is by 1,2,4 or 8, if you also consider the base register you can get multiplication by other constants, e.g: `lea eax,[edx*4+edx]` is equivalent to a multiplication by 5. This is very useful for strength reduction, e.g: the compiler can generate some smart code for a multiply by 1000. – ninjalj May 09 '12 at 21:22
@ninjalj Yeah, I'm aware of that. I just hid it under the "often abused for other uses as well" phrase. I've also seen multi-byte NOPs done with `lea`. Though I mostly see it used for out-of-place arithmetic. – Mysticial May 09 '12 at 21:24
I think this needs be mentioned as well: http://en.wikipedia.org/wiki/Barrel_shifter even the 3 bit shift (*8) is instantaneous. – v.oddou Mar 20 '14 at 02:52
2

Complex means 3 operands or a shift > 1. – Johan Apr 04 '16 at 15:26

ArjunShankar · Answer 2 · 2012-05-10T08:11:34.310

9

Actually, this is not something specific to the lea instruction.

This type of addressing is called Scaled Addressing Mode. The multiplication is achieved by a bit shift, which is trivial:

A Left Shift

You could do 'scaled addressing' with a mov too, for example (note that this is not the same operation, the only similarity is the fact that ebx*4 represents an address multiplication):

 mov edx, [esi+4*ebx]

(source: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html#memory)

For a more complete listing, see this Intel document. Table 2-3 shows that a scaling of 2, 4, or 8 is allowed. Nothing else.

Latency (in terms of number of cycles): I don't think this should be affected at all. A shift is a matter of connections, and selecting between three possible shifts is the matter of 1 multiplexer worth of delay.

edited May 10 '12 at 08:11

answered May 09 '12 at 08:38

ArjunShankar

23,020
5
61
83

1

That `mov` gets the **contents** of the memory at the generated "address", while `lea` gets the **generated "address"** – ninjalj May 09 '12 at 21:31
@ninjalj - The point is *not* what the mov does, but the fact that 'scaled addressing' (i.e. pointer math involving multiplying a pointer by 2,4,8) is not specific to `lea`. – ArjunShankar May 10 '12 at 08:08
@ninjalj - And since it appears to me that it was the wording of that part of my answer which was somewhat vague, I have explained the `mov` somewhat better now. Thanks. – ArjunShankar May 10 '12 at 08:12
1

the multiplexer thing is called a barrel shifter I believe. – v.oddou Mar 20 '14 at 02:53

score 8 · Answer 3 · answered May 09 '12 at 21:37

To expand on your last question:

Is the multiplication limited to powers of 2 (I would assume this is the case)?

Note that you get the result of base + scale * index, so while scale has to be 1, 2, 4 or 8 (the size of x86 integer datatypes), you can get the equivalent of a multiplication by some different constants by using the same register as base and index, e.g.:

lea eax, [eax*4 + eax]   ; multiply by 5

This is used by the compiler to do strength reduction, e.g: for a multiplication by 100, depending on compiler options (target CPU model, optimization options), you may get:

lea    (%edx,%edx,4),%eax   ; eax = orig_edx * 5
lea    (%eax,%eax,4),%eax   ; eax = eax * 5 = orig_edx * 25
shl    $0x2,%eax            ; eax = eax * 4 = orig_edx * 100

See this SO question: http://stackoverflow.com/questions/6120207/imul-or-shift-instruction for why you should leave strength reduction to the compiler. — ninjalj, May 09 '12 at 21:39

Why can assembly instructions contain multiplications in the "lea" instruction?

3 Answers3

Linked