8

I read somewhere that effective addresses (as in the LEA instruction) in x86 instructions are calculated by the "EU." What is the EU? What is involved exactly in calculating an effective address?

I've only learned about the MC68k instruction set (UC Boulder teaches this first) and I can't find a good x86 webpage by searching the web.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Tony R
  • 11,224
  • 23
  • 76
  • 101
  • See the [x86 tag wiki](http://stackoverflow.com/tags/x86/info) for good links to x86 hardware details, especially http://agner.org/optimize/ – Peter Cordes Aug 04 '16 at 04:10

4 Answers4

7

Intel's own Software Developer's Manuals are a good source of information on the x86, though they may be bit of an overkill (and are more reference-like rather than tutorial-like).

The EU (Execution Unit) reference was most likely in contrast to ALU (Arithmetic Logic Unit) which is usually the part of the processor responsible for arithmetic and logic instructions. However, the EU has (or had) some arithmetic capabilities as well, for calculating memory addresses. The x86 LEA instruction conveys these capabilities to the assembly programmer.

Normally you can supply some pretty complex memory addresses to an x86 instruction:

sub eax, [eax + ebx*4 + 0042]

and while the ALU handles the arithmetic subtraction, the EU is responsible for generating the address.

With LEA, you can use the limited address-generating capabilities for other purposes:

lea ebx, [eax + ebx*4 + 0042]

Compare with:

mul ebx, 4
add ebx, eax
add ebx, 0042

"Volume 1" on the page I've linked has a section "3.7.5" dicussing addressing modes - what kind of memory addresses you can supply to an instruction expecting a memory operand (of which LEA is one), reflecting what kind of arithmetic the EU (or whatever the memory interface part is called) is capable of.

"Volume 2" is the instruction set reference and has definitive information on all instructions, including LEA.

aib
  • 45,516
  • 10
  • 73
  • 79
  • 1
    I'm wondering which is faster/more efficient; the lea instruction or the mul, add, add combo, since they are being processed by different units (EU/ALU). – Tony R Apr 27 '09 at 01:15
  • 2
    It's really hard to tell with all the multi-stage pipelines, multi-cores, multi-units of today. The EU can be free for such a calculation while the ALUs are busy, and vice versa. Heck, I'm not even sure the EU/ALU distinction exists anymore. – aib Apr 27 '09 at 23:08
  • An ALU is one type of EU (the kind that can run add and shift instructions). Other kinds being a load unit, or a store unit, that can execute those uops. [Krazy Glew's answer](http://stackoverflow.com/a/11389785/224132) on this question explains more details. (Andy Glew was one of the architects of Intel's P6 design. His explanation of Intel's terminology is correct, and @TonyR should accept that answer). And using `lea` is always a win if you can replace more than one other instruction. It's a huge win if you can replace all 4 (shift, `add` and `add`-immediate, and `mov`). – Peter Cordes Aug 04 '16 at 04:06
  • This answer confuses EU with AGU. Everything it says about the "EU" should actually be replaced with "AGU". (Note that of modern x86 designs, only in-order Atom runs LEA on the actual AGU hardware, instead of as just another ALU instruction. Other CPUs use their AGUs only for actual loads/stores/prefetches.) – Peter Cordes Aug 04 '16 at 04:09
5

"EU" is the generic term for Execution Unit. The ALU is one example of an execution unit. FADD and FMUL, i.e. the floating point adder or multiplier, are other examples - as, for that matter are (is) the memory unit, for loads and stores.

The EUs relevant to LEA instructions are the ALU (add, subtract, AND/OR, etc.) and the AGU (Address Generation Unit). The AGU is coupled to the memory pipelines, TLB, data cache, etc.

A typical Intel x86 CPU back when I wrote the first codegen guide had 2 ALUs, 1 load pipeline tied to an AGU, a store address pipeline tied to a second AGU, and a store data pipeline. As of 2016 most have 3 or 4 ALUs and more than one load pipe.

LEA is a 3 input instruction - BaseReg+IndexReg*Scale+Offset. Just like the memory addressing mode of x86, which actually has a 4th input, the segment base, that is not part of the LEA calculation. 3 inputs necessarily costs more than the 2 inputs needed for ADD.

On some machines, the ALU can only do 2 input operations. LEA therefore can only execute on an AGU, specifically the AGU used for load (because the store ALU doesn't write a register). This may mean that you cannot do LEA at the same time as Load, or two LEAs at the same time, whereas you can two Adds and a load in the same cycle.

On other machines, LEA can be done by one, or two or three, of the ALUs. Possibly instead of the AGU - possibly as well as the ALU. This proves more flexibility.

Or, the simple LEAs, eg regscale+offset, can be done on the ALUs, whereas the biggest LEAs, eg breg+iregscale+offset, may be restricted, or possibly even broken into two uops.

So, the question comes down to: which EU (Execution Unit) handles which LEAs? The ALU or the AGU? The answer depends on the machine.

Generic text in an optimization guide may simply say "EU" rather than "AGU or ALU, depending on the model" or "whichever EU is capable of handling that particular LEA".

Krazy Glew
  • 7,210
  • 2
  • 49
  • 62
  • Also, "typical" x86 CPUs have 3 ALU ports / pipes and 2 load ports these days, unless you're looking at low-power designs like Silvermont. Haswell+ has 4 ALU ports. Only AMD Bulldozer-family still has only 2 integer ALU ports per integer core, and that's sort of a fixed-partitioning SMT. K8/K10 had a throughput of 3 ADDs per clock. And I notice you didn't try to get into the complexity of 2 ALU *ports*, but many specialized ALUs (e.g. scalar integer mul unit + vector FP mul unit + other stuff on port 0 of many Intel P6 / SnB-family uarches). – Peter Cordes Aug 04 '16 at 04:25
  • 2
    No, Peter, I did not get into the complexity of groups of specialized EUs sharing start ports and completion ports, let alone RF read and write ports, flexible latencies, etc. it was hard enough to explain those issues in the Intel compiler writer's guide, when I wrote the first version for P6 circa 1994. Too hard to fit into stackoverflow's fitmaf and primitively formatting. – Krazy Glew Aug 04 '16 at 04:53
  • Yup, you have to draw the line somewhere on how much detail to put into an answer. I mostly just mentioned ports + specialized EUs as a footnote for keen readers. Nice update; more accurate + correct while still being nice and short. – Peter Cordes Aug 04 '16 at 15:35
3

EU = Execution Unit?

Effective Address is the address that would have been accessed if the LEA instruction had been an instruction that actually performed some sort of arithmetic or other data access. Its 'intended' use is to calculate the resulting pointer from a pointer arithmetic or array indexing operation. However, because it can perform some combination of multiply and add, it's also used to optimize some regular calculations.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
0

The internals of processors inside a single family have changed a lot over the years, so that "EU" reference would need to be clarified with the exact cpu model. As an analogy to your m68k experience, the instruction set for 68000, 010, 020, 030, 040 and 060 are mostly the same but their internals are really different, so any reference to an internal name needs to come with their part number.

winden
  • 2,577
  • 1
  • 15
  • 6