I've run into a strange situation.
In my program I have a loop that combines a bunch of data together in a giant vector. I was trying to figure out why it was running so slowly, even though it seemed like I was doing everything right to allocate memory in an efficient manner on the go.
In my program it is difficult to determine how big the final vector of combined data should be, but the size of each piece of data is known as it is processed. So instead of reserving and resizing the combined data vector in one go, I was reserving enough space for each data chunk as it is added to the larger vector. That's when I ran into this issue that is repeatable using the simple snippet below:
std::vector<float> arr1;
std::vector<float> arr2;
std::vector<float> arr3;
std::vector<float> arr4;
int numLoops = 10000;
int numSubloops = 50;
{
// Test 1
// Naive test where no pre-allocation occurs
for (int q = 0; q < numLoops; q++)
{
for (int g = 0; g < numSubloops; g++)
{
arr1.push_back(q * g);
}
}
}
{
// Test 2
// Ideal situation where total amount of data is reserved beforehand
arr2.reserve(numLoops * numSubloops);
for (int q = 0; q < numLoops; q++)
{
for (int g = 0; g < numSubloops; g++)
{
arr2.push_back(q * g);
}
}
}
{
// Test 3
// Total data is not known beforehand, so allocations made for each
// data chunk as they are processed using 'resize' method
int arrInx = 0;
for (int q = 0; q < numLoops; q++)
{
arr3.resize(arr3.size() + numSubloops);
for (int g = 0; g < numSubloops; g++)
{
arr3[arrInx++] = q * g;
}
}
}
{
// Test 4
// Total data is not known beforehand, so allocations are made for each
// data chunk as they are processed using the 'reserve' method
for (int q = 0; q < numLoops; q++)
{
arr4.reserve(arr4.size() + numSubloops);
for (int g = 0; g < numSubloops; g++)
{
arr4.push_back(q * g);
}
}
}
The results of this test, after compilation in Visual Studio 2017, are as follows:
Test 1: 7 ms
Test 2: 3 ms
Test 3: 4 ms
Test 4: 4000 ms
Why is there the huge discrepancy in running times?
Why does calling reserve
a bunch of times, followed by push_back
take 1000x times longer than calling resize
a bunch of times, followed by direct index access?
How does it make any sense that it could take 500x longer than the naive approach which includes no pre-allocations at all?