I'm running into an odd issue where reducing the size of an input to a basic operation in a set of nested for loops causes an increase in runtime. Playing with it further, I've found it seems to be due to a notable increase in run time when the input size is, or near, a multiple of 128.
The function:
float strangeIssue(size_t patchLength)
{
using LabelType = unsigned char;
LabelType* emptyPatch = new LabelType[patchLength * patchLength * patchLength];
LabelType numClasses = 133;
size_t arraySize = numClasses * patchLength * patchLength * patchLength;
float* modelPred = new float[arraySize];
clock_t begin_time = clock();
size_t predInd = 0;
float currVal, maxVal;
LabelType label;
size_t sliceSize = patchLength * patchLength;
size_t volumeSize = patchLength * patchLength * patchLength;
for (size_t x = 0; x < patchLength; x++) {
for (size_t y = 0; y < patchLength; y++) {
for (size_t z = 0; z < patchLength; z++) {
predInd = x * sliceSize + y * patchLength + z;
maxVal = modelPred[predInd];
label = 0;
for (LabelType classInd = 1; classInd < numClasses; classInd++) {
size_t voxelClassInd = predInd + classInd * volumeSize;
currVal = modelPred[voxelClassInd];
if (currVal > maxVal) {
label = classInd;
maxVal = currVal;
}
}
emptyPatch[predInd] = label;
}
}
}
float totalTime = (float)(clock() - begin_time);
delete[] modelPred;
delete[] emptyPatch;
return totalTime / CLOCKS_PER_SEC;
}
calling it
std::vector<size_t> patchLengths({104, 112, 120, 127, 128, 129, 136, 144, 152, 160, 168, 176, 184, 240, 248, 255, 256, 257, 264, 272});
for (size_t patchLength : patchLengths) {
std::cout << "patchLength " << patchLength << " time in secs " << strangeIssue2(patchLength) << std::endl;
}
and the output (plus comments)
patchLength 104 time in secs 0.638
patchLength 112 time in secs 0.776
patchLength 120 time in secs 0.791
patchLength 127 time in secs 1.639 <--- not ~0.8xx?
patchLength 128 time in secs 2.175 <--- really?
patchLength 129 time in secs 1.596 <--- still pretty long
patchLength 136 time in secs 1.053 <--- getting back to expected
patchLength 144 time in secs 1.339
patchLength 152 time in secs 1.454
patchLength 160 time in secs 1.9
patchLength 168 time in secs 1.958
patchLength 176 time in secs 2.435
patchLength 184 time in secs 2.599
patchLength 240 time in secs 6.263
patchLength 248 time in secs 6.458
patchLength 255 time in secs 13.274 <--- why?
patchLength 256 time in secs 26.321 <--- wow!
patchLength 257 time in secs 13.764 <--- long
patchLength 264 time in secs 7.86 <--- ok
patchLength 272 time in secs 9.151
So the time is monotonically increasing until around 128. 128 takes longer than 168, while 129 and even 136 are much less. Around 256, both 255 and 257 take longer than expected, though still half that of 256. Wall clock time (i.e. std::chrono::high_resolution_clock
) has a similar pattern.
If I understand the standard right, modelPred could be initialized to anything, but when running this I've found the condition (currVal > maxVal)
is never triggered. However, if I comment the if
statement out, the compiler seems to be smart enough to skip the innermost for loop entirely and all times are ~0.
I'm seeing this with Release builds from VS 2017 and VS 2019 on windows, and gcc on ubuntu, and on a couple different machines.