I understand (vaguely) what Cycles Per Instruction (CPI) and Instructions Per Cycle (IPC) mean. CPI is the number of clock cycles required to execute the program divided by the number of instructions executed running the program. IPC on the other hand is the number of instructions executed while running a program divided by the number of clock cycles required to execute the program.
However, I am having trouble understanding what Cycles Per Elements mean when associated with loops.
For example, in the following code,
void combine4(vec_ptr v, data_t *dest) {
long i;
long length = vec_length(v);
data_t *d = get_vec_start(v);
data_t t = IDENT;
for (i = 0; i < length; i++)
t = t OP d[i];
*dest = t;
}
we can make multiple optimizations by changing the for loop style.
one method is known as loop unrolling
/* Combine 2 elements at a time */
for (i = 0; i < limit; i += 2) {
x = (x OP d[i]) OP d[i + 1];
}
/* Finish any remaining elements */
for (; i < length; i++) {
x = x OP d[i];
}
to improve it even more, we can put parentheses around the array accesses.
x = x OP (d[i] OP d[i + 1]);
I believe we start calculating the next loop's information because we don't have dependent data. How would the term CPE apply to this optimization? Would CPE decrease? because it takes fewer cycles to run through all the elements?