14

I am working on a very low level part of the application in which performance is critical.

While investigating the generated assembly, I noticed the following instruction:

lea eax,[edx*8+8]

I am used to seeing additions when using memory references (e.g. [edx+4]), but this is the first time I see a multiplication.

  • Does this mean that the x86 processor can perform simple multiplications in the lea instruction?
  • Does this multiplication have an impact on the number of cycles needed to execute the instruction?
  • Is the multiplication limited to powers of 2 (I would assume this is the case)?

Thanks in advance.

Mysticial
  • 464,885
  • 45
  • 335
  • 332
Patrick
  • 23,217
  • 12
  • 67
  • 130
  • 5
    Notice how it's a multiply by a power of two. – Mysticial May 09 '12 at 08:04
  • 1
    See also [What's the purpose of the LEA instruction?](https://stackoverflow.com/questions/1658294/whats-the-purpose-of-the-lea-instruction) for more general stuff about using it for things other than strictly address calculations. – Peter Cordes Aug 19 '17 at 16:18

3 Answers3

13

To expand on my comment and to answer the rest of the question...

Yes, it's limited to powers of two. (2, 4, and 8 specifically) So no multiplier is needed since it's just a shift. The point of it is to quickly generate an address from an index variable and a pointer - where the datatype is a simple 2, 4, or 8 byte word. (Though it's often abused for other uses as well.)

As for the number of cycles that are needed: According to Agner Fog's tables it looks like the lea instruction is constant on some machines and variable on others.

On Sandy Bridge there's a 2-cycle penalty if it's "complex or rip relative". But it doesn't say what "complex" means... So we can only guess unless you do a benchmark.

Mysticial
  • 464,885
  • 45
  • 335
  • 332
  • Thanks. Also liked the reference to Agner Fog (although the link in your answer doesnt' seem to be correct). – Patrick May 09 '12 at 08:29
  • Oops, bad copy+paste. Fixed now. – Mysticial May 09 '12 at 08:30
  • @Mysticial - AFAIK, this is not `lea` specific. Check out my answer :) – ArjunShankar May 09 '12 at 08:39
  • 4
    Note that while technically the multiplication is by 1,2,4 or 8, if you also consider the base register you can get multiplication by other constants, e.g: `lea eax,[edx*4+edx]` is equivalent to a multiplication by 5. This is very useful for strength reduction, e.g: the compiler can generate some smart code for a multiply by 1000. – ninjalj May 09 '12 at 21:22
  • @ninjalj Yeah, I'm aware of that. I just hid it under the "often abused for other uses as well" phrase. I've also seen multi-byte NOPs done with `lea`. Though I mostly see it used for out-of-place arithmetic. – Mysticial May 09 '12 at 21:24
  • I think this needs be mentioned as well: http://en.wikipedia.org/wiki/Barrel_shifter even the 3 bit shift (*8) is instantaneous. – v.oddou Mar 20 '14 at 02:52
  • 2
    Complex means 3 operands or a shift > 1. – Johan Apr 04 '16 at 15:26
9

Actually, this is not something specific to the lea instruction.

This type of addressing is called Scaled Addressing Mode. The multiplication is achieved by a bit shift, which is trivial:

A Left Shift

You could do 'scaled addressing' with a mov too, for example (note that this is not the same operation, the only similarity is the fact that ebx*4 represents an address multiplication):

 mov edx, [esi+4*ebx]

(source: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html#memory)

For a more complete listing, see this Intel document. Table 2-3 shows that a scaling of 2, 4, or 8 is allowed. Nothing else.

Latency (in terms of number of cycles): I don't think this should be affected at all. A shift is a matter of connections, and selecting between three possible shifts is the matter of 1 multiplexer worth of delay.

ArjunShankar
  • 23,020
  • 5
  • 61
  • 83
  • 1
    That `mov` gets the **contents** of the memory at the generated "address", while `lea` gets the **generated "address"** – ninjalj May 09 '12 at 21:31
  • @ninjalj - The point is *not* what the mov does, but the fact that 'scaled addressing' (i.e. pointer math involving multiplying a pointer by 2,4,8) is not specific to `lea`. – ArjunShankar May 10 '12 at 08:08
  • @ninjalj - And since it appears to me that it was the wording of that part of my answer which was somewhat vague, I have explained the `mov` somewhat better now. Thanks. – ArjunShankar May 10 '12 at 08:12
  • 1
    the multiplexer thing is called a barrel shifter I believe. – v.oddou Mar 20 '14 at 02:53
8

To expand on your last question:

Is the multiplication limited to powers of 2 (I would assume this is the case)?

Note that you get the result of base + scale * index, so while scale has to be 1, 2, 4 or 8 (the size of x86 integer datatypes), you can get the equivalent of a multiplication by some different constants by using the same register as base and index, e.g.:

lea eax, [eax*4 + eax]   ; multiply by 5

This is used by the compiler to do strength reduction, e.g: for a multiplication by 100, depending on compiler options (target CPU model, optimization options), you may get:

lea    (%edx,%edx,4),%eax   ; eax = orig_edx * 5
lea    (%eax,%eax,4),%eax   ; eax = eax * 5 = orig_edx * 25
shl    $0x2,%eax            ; eax = eax * 4 = orig_edx * 100
ninjalj
  • 42,493
  • 9
  • 106
  • 148
  • 2
    See this SO question: http://stackoverflow.com/questions/6120207/imul-or-shift-instruction for why you should leave strength reduction to the compiler. – ninjalj May 09 '12 at 21:39