Multiply is both a good and bad example. First off multiply is an expensive instruction, some processors dont have one for good reason. You can and x86 and others have, take many clocks or one clock. To get a one clock multiply takes a (relatively) large amount of chip real estate (as Dani mentioned likely a dedicated block of logic, just for the multiply). Absolutely no reason why one designer would make the same choices as another, be it within the same company (one x86 compared to another) or different architectures (x86 vs arm vs mips, etc). Every designer knows that the result of a multiply is twice as many bits as the operands, so do you choose to give the programmers the full answer to all combinations of operands (result is a different size as the operands) or do you clip the result at the operand size? If you clip to you give them an overflow or exception, or do you let them keep on running without knowing the result is wrong? Do you force them to add wrappers around all mul and div instructions so that the overflow can be detected costing performance?
x86 is an incredibly bad architecture to learn first or use as a reference to others. it leads to a lot of bad assumptions. Not all processors are microcoded. Not all CISC processors are microcode. No reason why a RISC processor cant be microcoded, you can microcode either CISC or RISC or not microcode CISC or RISC, it is a design choice, not a rule.
RISC does not mean the smallest number of steps, even a simple register to register move is two steps minimum (fetch source, store result) which can take two clocks to execute the way processors are sometimes implemented (with sram banks for register files which are not necessarily dual ported). An alu instruction is three steps and can take three clocks on a RISC processor, the RISC will AVERAGE one clock per instruction, but so can an CISC. You can go superscalar and exceed one clock per instruction, at least for bursts when processor bound. The complications for going superscalar are the same for CISC vs RISC.
I suggest writing an instruction set simulator, or at least starting one. If nothing else then a disassembler. Even better take 100 programmers and have them perform the same programming task but in isolation from each other. Even if all taught at the same school by the same teachers, you are going to get somewhere between 3 and 100 different designs for that iss or disassembler. Make it a text editor as a programming task, just the programming language choices will be a bit first different then the design of the program will vary. Hardware design very much resembles software design, you use programming languages and have a compiler, and something like a linker, etc. Take a room full of hardware designers give them the same task and you get different designs. Has less to do with CISC vs RISC and a lot more to do with the design team and their choices. Intel has different design goals, reverse compatibility for example, this is a very expensive choice.
BOTH CISC and RISC convert each instruction into smaller digestible/divisible steps based on the design of the processor. Replace multiply with add, and compare CISC vs RISC at the asm level then deeper. With x86 you can use memory as operands, with arm for example you cant. so
register = memory + register
is
load register from memory
regster = register + register
You have the extra step.
But they both break down into the same sequence of steps
resolve memory address
start memory cycle,
wait for memory cycle to end,
fetch register from register memory
send operands to alu
take alu output and store in register memory
Now the cisc is actually slightly faster because the risc to execute the instructions properly would need to store the value read from memory in the extra register (cisc two registers from the asm perspective, risc, three or two with on reused).
If the value being read from memory is not aligned, then the cisc wins on a technicality (if the risc does not allow unaligned transfers generally). It takes the cisc processor the same number of memory cycles to fetch the unaligned data all things held equal (takes both processors two memory cycles, the cisc is punished as well as risc). But to focus on asm instructions vs asm instructions, if the memory operand were unaligned the risc would have to do this
read memory to register a
read memory to register b
shift a,
shift b,
or/add
where the cisc does:
read memory to register (takes two memory cycles)
You also have instruction size, popular risc processors like arm and mips lean towards a fixed instruction length where x86 is variable. x86 can do in one byte what it takes another to do in four. Yes your fetch and decode is more complicated (more logic, more power, etc) but you can fit more instructions in the same size cache.
Microcoding does more than break one instruction set into another (the other being something likely quite painful that you would never want to program in natively). Microcoding can help you get to market faster assuming the lower level system is quicker to implement with fewer bugs. An assumption being you can ramp up production sooner because you can fix some of the bugs after the fact, and can patch in the field down the road. Not always perfect, not always a success, but compare that to a non-microcoded processor where you would have to get the compiler folks to fix the bug or recall the processor or take a black eye as a company and hope to win some customers back, etc...
So the answer is NO. Both RISC and CISC turn an individual instruction into a sequence of steps that can be microcoded or not. Think simply that they are states in a state machine implemented however you like. CISC may have some that pack more steps into one instruction, but that means less instruction fetches. And knowing the whole CISC instruction, the steps may naturally implemented more efficiently in the chip, where the RISC processor might have to examine a series of instructions and optimize on the fly to get the same number of steps. (ldr r0,[r1]; add r0,r0,r2). CISC can also be looking for the same kind of optimizations if it were to examine groups of instructions instead of focusing on one. Both use pipes and parallel execution. CISC often implies x86 and RISC implies something with a more modern and cleaner architecture. Cleaner in the sense that it is easier for the humans to program in and to implement, doesnt automatically mean faster. More steps to do the same job. x86 being variable word length with a history going back to single byte instructions, compared to say 4 byte fixed instruction length, there is a possibility that the x86 can pack more instructions into a cache than a fixed instruction length risc, giving the x86 a possible performance boost. Why doesnt risc just convert many instructions into a single smaller instruction that moves through the cache and pipeline faster?