I have this code:
for(i=0; i<size; i++)
{
d[i] = d[i-1] + v[i];
}
When I do parallel processing for this loop, I have data dependency and the initiation interval becomes 2 Meaning I have:
initiation interval:2
|load v[i-1]|load d[i-2]| add |store d[i-1]|
| | | load v[i]|load d[i-1] | add | store d[i] |
I do not want to stall in between.
initiation interval:1
|load v[i-1]|load d[i-2]| add |store d[i-1]|
| |load v[i] |load d[i-1]| add | store d[i] |
This is not possible since d[i-1] is not stored yet.
How do we make initiation interval to 1 by changing the code?