The architecture has little to do with it, in ARM one of the more significant differences is memory ordering can be quite relaxed (possibly under the control of the user). Even an in-order 3 stage pipeline Cortex-M has scenarios which necessitate the use of ISB
and DSB
.
Executes instructions in sequential order
This is the view presented to the programmer at all times, so it doesn't really describe much.
Until current instruction is completed, it will not execute next
instruction.
Incorrect. All modern processors are pipelined, and fetch/decode/branch predict can all occur in an in-order machine whilst earlier instructions are still in flight. There are likely to be places where state is cached in case it needs to be reverted.
Have slower execution speed.
Not guaranteed. A wide in-order machine can have a higher IPC than an out of order machine. It won't necessarily make sense to build it though.
Executes instructions in non-sequential order
This is called 'out of order dispatch', or 'speculative execution' (which is a different thing, working at a higher level). In actual ARM cores, 'out of order completion' is more common. This is where the loads and stores are computed, then issued to a set of buffers. Even a single issue machine with a single memory interface can have multiple store buffers to permit stores to queue up whilst ALU operations continue in the processor. With more than one memory interface (or a bus like AXI), a slow load can be in progress whilst any number of other transactions complete. Out of order completion is much simpler to implement than any form of out of order dispatch, and is facilitated in the ARM architecture by 'precise aborts' (occurring at the logical place in the program order), and 'imprecise aborts' (occurring late when the memory system finally fails to resolve a transaction).
A further example of ordering is a scenario where there are 2 integer pipelines and one float pipeline. Not only are the pipelines of potentially different length, but there is nothing to say that they must map onto incoming instructions in a set order - provided the dependencies are handled.
Even if current instruction is NOT completed, it will execute next
instruction. (This is done only if the next instruction does not
depend on the result of current instruction)
This is generally true of all pipelined processors. Any stage could stall when it depends on some earlier instruction making progress.
Faster execution speed.
Maybe, depending on the constraints. Significantly, a compiler will benefit from understanding the optimum ordering, and it can make a difference if a binary needs to be optimum for a single target device or a wide range of devices.