I have a strange problem and maybe one of you has an idea what is going on there.
The code I'm working on is a longish and complex simulation code. I have a function matrixSetup which is called at the beginning and where I measure the runtime of. After setting up my matrix and doing many other stuff, I'm running my solver and so on.
Now I changed something on my solver code and this should not influence the runtime of the matrix setup. However, I see an increase there from 90 to 150 seconds. Without touching that piece of code. Why? How?
This time difference is fully reproducible. Undoing the change in the solver gives back the fast matrixSetup. Doing other changes in the solver might or might not lead to the same increase in runtime, all reproducible. The runs have been caried out in an isolated way on an empty compute node, so no influence from there.
When using vTune to find out where the increase in runtime occurs, I end up at a simple loop (in a loop nest):
for (l = 0; l < nrConnects; l++)
if (connectedPartitions[l] == otherParti) {
nrCommonCouplNodes[l]++;
pos = l;
break;
}
Does anybody have an idea what is going on there? The compiler genrated instructions are fully the same regarding to vTune. I'm using the Intel compiler, version 19.0.1.
I was playing around with compiler flags a little bit. When stating -fpic (Determines whether the compiler generates position-independent code) the increase in runtime is gone. But I assume, this causes just slightly different instructions and hence does not heal the real problem I'm facing.
With Clang, I do not see (at least here) this behaviour...
Any ideas on the reason for the increased runtime? I'm very curious...
Cheers Michael