So I've been reading this question and playing with its code. It has been a couple of years since it was originally posted and I was curious on how new compilers handle it. However, what I'm finding out on g++4.9.1
is leaving me extremely perplexed.
I have this code:
#include <cstdlib>
#include <vector>
#include <iostream>
#include <string>
#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>
constexpr int outerLoopBound = 1000;
class TestTimer {
public:
TestTimer(const std::string & name) : name(name),
start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time()) {}
~TestTimer() {
using namespace std;
using namespace boost;
posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
posix_time::time_duration d = now - start;
cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
" seconds" << endl;
}
private:
std::string name;
boost::posix_time::ptime start;
};
struct Pixel {
Pixel() {}
Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b) {}
unsigned char r, g, b;
};
double UseVector() {
TestTimer t("UseVector");
double sum = 0.0;
for(int i = 0; i < outerLoopBound; ++i) {
int dimension = 999;
std::vector<Pixel> pixels;
pixels.resize(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i) {
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
sum += pixels[0].b;
}
return sum;
}
double UseVector2() {
TestTimer t("UseVector2");
double sum = 0.0;
for(int i = 0; i < outerLoopBound; ++i) {
int dimension = 999;
std::vector<Pixel> pixels(dimension*dimension, Pixel(255,0,0));
sum += pixels[0].b;
}
return sum;
}
double UseVector3() {
TestTimer t("UseVector3");
double sum = 0.0;
for(int i = 0; i < outerLoopBound; ++i) {
int dimension = 999;
Pixel p(255, 0, 0);
std::vector<Pixel> pixels(dimension*dimension, p);
sum += pixels[0].b;
}
return sum;
}
double UseVectorPushBack() {
TestTimer t("UseVectorPushBack");
double sum = 0.0;
for(int i = 0; i < outerLoopBound; ++i) {
int dimension = 999;
std::vector<Pixel> pixels;
pixels.reserve(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
pixels.push_back(Pixel(255, 0, 0));
sum += pixels[0].b;
}
return sum;
}
void UseVectorEmplaceBack() {
TestTimer t("UseVectorPushBack");
for(int i = 0; i < outerLoopBound; ++i) {
int dimension = 999;
std::vector<Pixel> pixels;
pixels.reserve(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
pixels.emplace_back(Pixel(255, 0, 0));
}
}
double UseArray() {
TestTimer t("UseArray");
double sum = 0.0;
for(int i = 0; i < outerLoopBound; ++i) {
int dimension = 999;
Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);
for(int i = 0 ; i < dimension * dimension; ++i) {
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
sum += pixels[0].b;
free(pixels);
}
return sum;
}
int main()
{
TestTimer t1("The whole thing");
double result = 0.0;
result += UseArray();
result += UseVector();
result += UseVector2();
result += UseVector3();
result += UseVectorPushBack();
std::cout << "Result is: " << result << '\n';
return 0;
}
I've basically modified a bit the original code in the hope to avoid the compiler from nullifying everything. So we have:
- UseVector: Creates empty vector, uses
resize
, loops and sets allPixel
s. - UseVector2: Creates vector directly of the size needed, and instantiates
Pixel
s from temporary. - UseVector3: Creates vector directly of the size needed, and instantiates
Pixel
s from a singlelvalue
. - UseVectorPushBack: Creates empty vector, uses
reserve
, addsPixel
s withpush_back
. - UseVectorEmplaceBack: Creates empty vector, uses
reserve
, addsPixel
s withemplace_back
. - UseArray:
mallocs
an array, loops setting allPixel
s, deallocates.
In addition all these functions accumulate values in a sum
variable which is returned so to prevent the compiler from eliminating the loops. In the main
function I test all of these functions but UseVectorEmplaceBack
. This is important for later.
So I compile with the following flags: g++ -O3 -march=native -std=c++11 main.cpp
. I'm on an FX-8350.
Round One
As is, the code produces for me:
UseArray completed in 0.248 seconds
UseVector completed in 0.245 seconds
UseVector2 completed in 0.872 seconds
UseVector3 completed in 0.981 seconds
UseVectorPushBack completed in 4.779 seconds
Result is: 0
The whole thing completed in 7.126 seconds
So first thing I notice is that UseVector
, which in the original question was slower, is now on par as a C array, even though in theory it should be doing two passes on the data.
On the other hand, UseVector2
and UseVector3
are 4 times as slow as UseVector
. This seems strange to me, why would this happen?.
Round Two
Ok, so we have a UseVectorEmplaceBack
function, but we're not really testing it. Why not comment it? So we comment it, and we try again the code:
UseArray completed in 0.246 seconds
UseVector completed in 0.245 seconds
UseVector2 completed in 0.984 seconds
UseVector3 completed in 0.8 seconds
UseVectorPushBack completed in 4.771 seconds
Result is: 0
The whole thing completed in 7.047 seconds
Ok, so apparently while before UseVector2
was a bit faster than UseVector3
, now the situation has reversed. This result keeps happening even after running the code multiple times. So we changed the running time of two functions by commenting an unused function. Wait what?
Round Three
Continuing from here, at this point we think that UseVector3
for some reason is the faster one. We want to make it even faster, and to do so we comment the following line in UseVector3
to reduce its workload:
// sum += pixels[0].b;
Now the function will be faster, thanks to our incredible coding ability! Let's test it:
UseArray completed in 0.245 seconds
UseVector completed in 0.244 seconds
UseVector2 completed in 0.81 seconds
UseVector3 completed in 0.867 seconds
UseVectorPushBack completed in 4.778 seconds
Result is: 0
The whole thing completed in 6.946 seconds
Ok so we removed an operation from UseVector3
, which slowed down, while UseVector2
, untouched, became faster than the other.
Conclusions
In addition to this there are a bunch of other weird behaviours which are too numerous to mention. It seems that every random edit on this code produces weird effects overall. For now I'm just mainly curious about the three things which I showed here:
- Why
UseVector
is faster than bothUseVector2
andUseVector3
? - Why commenting an unused function alters the time of two other functions, but not the rest?
- Why removing an operation from a function slows it down, but accelerates another?