14

According to "Computer Architecture and Organization" by Miles Murdoca and Vincent Heuring,

CISC instructions do not fit pipelined architectures very well. For pipelining to work effectively, each instruction needs to have similarities to other instructions, at least in terms of relative instruction complexity.

Why is this true? What is meant by an instruction's complexity; don't all instructions take one clock cycle to begin execution? If the instruction is reading or writing to memory then it would take longer but RISC processors read or write to memory too (of course)?

Jonas
  • 121,568
  • 97
  • 310
  • 388
Celeritas
  • 14,489
  • 36
  • 113
  • 194
  • The best way to find out what the authors of the book meant is to ask them directly. Concerning the other parts of your question, on modern processors simple instructions (ADD/SUB/MOV, logical instructions, shifts) typically execute in 1 cycle, integer multiplication executes in 3-4 cycles, floating-point multiplication in 3-6, floating-point addition in 2-5. – Marat Dukhan Jun 25 '13 at 03:19
  • @MaratDukhan but what is a cycle? Even a [Google search](https://www.google.com/search?q=define%3Acycle&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a#client=firefox-a&hs=Mig&rls=org.mozilla:en-US:official&q=clock+cycle&tbs=dfn:1&tbo=u&sa=X&ei=9hfJUZvXLrD1igLztYHYDw&ved=0CC0QkQ4&bav=on.2,or.r_qf.&bvm=bv.48340889,d.cGE&fp=a6e1eebee3fff02f&biw=1920&bih=956) either said it's the most fundamental unit of time or the amount of time it takes to complete one instruction, which gives a circular definition. – Celeritas Jun 25 '13 at 04:10
  • 1
    An assembly line that produces one widget per minute does not in any way mean that the widget takes a minute to produce, it can take hours or days per widget start to finish. that production line though likely has very few if any variations per widget, so the assembly line can move smoothly, minute after minute forever. Instructions obviously take a number of clock cycles start to finish on modern computers or old. They strive to average one (or more) instruction(s) completed per clock cycle for bursts of instructions, then you get a stall, and try again. – old_timer Jun 25 '13 at 05:02
  • Their statement includes the answer to your question. Each instruction needs to have similarities to other instructions. You want the instructions to go through the same assembly line/pipe line they need to break down into similar steps in the same order. CISC traditionally doesnt, RISC traditionally does. – old_timer Jun 25 '13 at 05:05
  • write an instruction set simulator for say the pdp11 instruction set, dont finish it just start. then write one for the older pics, a pic 12 or 14, which should take all of a half an hour to an our to completely finish and debug. compare the complexity of what it takes to completely parse and execute each instruction as defined as an instructly. Even if you only do enough of the each instruction set to add, compare, and branch if not equal, enough to execute a loop for a while. that should explain what they are talking about. – old_timer Jun 25 '13 at 05:08

2 Answers2

15

The "complexity" of the instructions is related to how much their size and format can vary. Take x86 IA32 (Intel 32-bits) architecture for instance, which is CISC. The size of instructions can range from 1 to 15 bytes, and their format varies a lot too (the format being how many bits are used for each field, where those bits are located and so on).

This means that you'll only know when you are done fetching the instruction once you start decoding it. Some instructions will take only a cycle to be fetched, others more, and this complicates the pipeline process.

All ARM instructions (RISC architecture), on the other hand, have exactly 4 bytes. So once you fetch 4 bytes you know that you can send those bytes for the decoding phase of the pipeline and you can immediately start fetching the next instruction.

Sasha
  • 102
  • 8
Daniel Scocco
  • 7,036
  • 13
  • 51
  • 78
  • Variable-length encoding is one source of complexity, but definitely not the only. Memory-source or especially memory-*destination* (RMW) ALU instructions are a dependent chain of operations. There's a reason RISC machines are always load-store machines. – Peter Cordes Jul 12 '19 at 07:22
  • Modern x86 CPUs can and do fetch in 16 or 32-byte chunks and decode in parallel; it just costs more transistors / power to find instruction boundaries. Unless you have an L1i cache miss, no instruction takes more than 2 cycles to fetch (if it's split across a 16-byte boundary), and the frontend is pipelined (with buffers between stages in newer CPUs to hide bubbles). See https://www.realworldtech.com/sandy-bridge/4/ for Intel Sandybridge's front-end fetch/decode stages. Yes it's much more complicated / expensive than on a RISC, that's part of the x86 "tax" that superscalar x86 CPUs must pay. – Peter Cordes Jul 12 '19 at 07:33
  • 1
    @PeterCordes - yeah I don't think the OP's quote is actually talking about decode complexity at all, but rather how instructions should have short, fixed latencies to fit cleanly into e.g., a 5-stage pipeline model where each instruction writes back its result at a fixed stage, and also has a small fixed number of inputs, etc. – BeeOnRope Jul 12 '19 at 22:00
4

What is meant by this is with CISC architectures, there are typically instructions that are relatively longer than RISC. So the scheduling is trickier. In CISC, there are often mixes of simpler instructions, and more complicated instructions that take longer. So in a pipeline there are things called hazards that can create problems for smooth pipelining. X86 Floating Point instructions would be longer than x86 load or store, for example.

BillH
  • 421
  • 3
  • 7
  • So really simply what you're saying is with CISC there are more pipeline hazards? – Celeritas Jun 25 '13 at 04:11
  • Yes, the more regular or similar your instruction set is, the easier it is to schedule. Instructions can vary not only on length, but also on the complexity. – BillH Jun 25 '13 at 12:13
  • Please look at Instruction Scheduling on Wikipedia. – BillH Jun 26 '13 at 03:00
  • 1
    FPU instructions are a poor example because RISC CPUs have the same problem: longer latency for FP mul than integer add. Unless you're talking about x87 instructions like `fsin` or [`fyl2x`](https://www.felixcloutier.com/x86/FYL2X.html) that are internally microcoded as *many* simple operations including probably a lookup table, and take upwards of 100 cycles vs. 3 to 5 cycles (fully pipelined) for `fmul`. (https://agner.org/optimize/ has numbers for CPUs as old as in-order pipelined P5 Pentium, which is pipelined *without* decoding most complex instructions to multiple uops) – Peter Cordes Jul 12 '19 at 07:27