This is an extended quesion of How can I resolve data dependency in pointer arrays? .
I'll refer the question description first:
If we have an array of integer pointers which all pointing to the same int, and loop over it doing ++ operation, it'll be 100% slower than those pointers pointing to two different ints.
Here is a new version of example code:
// Make sure it takes at least two cachelines
struct Cacheline {
int c[128]{};
};
int main() {
Cacheline d[4];
vector<int*> f;
f.resize(100000000);
// case 1 : counting over the same location
{
for (auto i = 0ul; i < f.size(); ++i) {
f[i] = d[i % 1].c;
}
/// this takes 200ms
for (auto i = 0ul; i < f.size(); ++i) {
++*f[i];
}
}
{
// case 2 : two locations interleaved
for (auto i = 0ul; i < f.size(); ++i) {
f[i] = d[i % 2].c;
}
/// this takes 100ms
for (auto i = 0ul; i < f.size(); ++i) {
++*f[i];
}
}
....
// three locations takes 90ms and four locations takes 85ms
}
I understand that the performance gain of case 2
is because the out-of-order execution mechanism kicks in and hides the latency of data dependency. I'm trying to find a way of optimizing this in general by utilizing OoO execution. The expected method should have negligible pre-processing cost as my use case is against dynamic workloads.