In the book Computer Systems: A Programmer's Perspective, the Exercise 5.5 shows a piece of code to compute the value of a polynomial
double poly(double a[], double x, int degree)
{
long int i;
double result = a[0];
double xpwr = x;
for (i = 1; i <= degree; i++) {
result += a[i] * xpwr;
xpwr = x * xpwr;
}
return result;
}
The exercise assumes that the clock cycles that are needed by double-precision floating-point addition and multiplication are 3 and 5 respectively. The reader is asked to explain why the measured CPE (Cycles Per Element) value is 5.
According to the exercise answer, in each iteration, we need to update the variables xpwr
and result
, and the operations we need is a floating-point addition (for result
) and a floating-point multiplication (for xpwr
), so the latter dominates the latency, causing the ultimate CPE to be 5.
But I think the data flow should be something like this:
xpwr result
| |
+-----+ +--[load] |
| | | |
[mul] [mul] |
| | |
| +---+ +-----+
| | |
| [add]
| |
| +------+
| |
xpwr result
So the longest path is from the previous value of xpwr
to the new value of result
, going through the execution units [mul]
and [add]
. Therefore the longest time should be 8 cycles.
I want to ask
- What is exactly the meaning of a critical path? And how to determine it?
- Which answer (mine and the book) is more reasonable?
Any explanation about the CPU, architecture, execution units, pipeline, floating-point unit will be appreciated.