Pipeline refill cycles for instructions in arm

Question

Following are the instructions of arm Cortex M4 processor with 3 stage pipeline.How do we come to know the number of pipeline refill cycle for such instructions??

     Assembler                   Cycles
 1.  MOV PC, Rm                  1 + P
 2.  ADD PC, PC, Rm              1 + P
 3.  B <label>                   1 + P
 4.  BL <label>                  1 + P

As per datasheet P is the number of cycles required for a pipeline refill. This ranges from 1 to 3 depending on the alignment and width of the target instruction, and whether the processor manages to speculate the address early.

score 6 · Accepted Answer · edited May 23 '17 at 11:53

The answer is right there in the question: between 1 and 3 cycles depending on things. Even on something as relatively simple as Cortex-M4 there are enough factors that it's not necessarily possible (or useful) to specify some hard-and-fast rule. However, that's not to say we can't do a bit of reasoning given the available information:

depending on the alignment and width of the target instruction

Instruction fetches are 32 bits wide, therefore it's fairly safe to assume that the 3-cycle worst-case involves a halfword-aligned 32-bit target instruction, needing 2 instruction fetches before the whole instruction can be decoded. Chances are, then, that a 16-bit target instruction or a word-aligned 32-bit one, covered by a single instruction fetch, would be reached in one fewer cycle.

and whether the processor manages to speculate the address early

Given the above, then it seems reasonable that the difference between a successful branch prefetch and an unsuccessful one accounts for the other of the 2 cycles between best-case and worst-case. There doesn't seem to be much information available about the branch predictor, but I'd assume it's a simple static predictor in the decode stage of the pipeline, in which case it's probably the case that register branches (including PC writes) and conditional forward branches are not predicted, and unconditional immediate branches and conditional backward branches are predicted.

Now, this is just educated guessing - I don't know the secrets of ARM's microarchitectures, so there may be more subtleties than I've imagined here, but it's already complicated enough. I doubt anyone would care to pick through disassembled code, cross-referencing against all the possible branch/target combinations, just to account for 2 cycles here and there - if you really need to know how many cycles a piece of code executes in, then the best thing to do is just execute it and count the cycles.

Pipeline refill cycles for instructions in arm

1 Answers1