0

So I've been reading this question and playing with its code. It has been a couple of years since it was originally posted and I was curious on how new compilers handle it. However, what I'm finding out on g++4.9.1 is leaving me extremely perplexed.

I have this code:

#include <cstdlib>
#include <vector>

#include <iostream>
#include <string>

#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>

constexpr int outerLoopBound = 1000;

class TestTimer {
    public:
        TestTimer(const std::string & name) : name(name),
        start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time()) {}

        ~TestTimer() {
            using namespace std;
            using namespace boost;

            posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
            posix_time::time_duration d = now - start;

            cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
            " seconds" << endl;
        }

    private:
        std::string name;
        boost::posix_time::ptime start;
};

struct Pixel {
    Pixel() {}

    Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b) {}

    unsigned char r, g, b;
};

double UseVector() {
    TestTimer t("UseVector");
    double sum = 0.0;

    for(int i = 0; i < outerLoopBound; ++i) {
        int dimension = 999;

        std::vector<Pixel> pixels;
        pixels.resize(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
        sum += pixels[0].b;
    }
    return sum;
}

double UseVector2() {
    TestTimer t("UseVector2");
    double sum = 0.0;
    for(int i = 0; i < outerLoopBound; ++i) {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension*dimension, Pixel(255,0,0));
        sum += pixels[0].b;
    }
    return sum;
}

double UseVector3() {
    TestTimer t("UseVector3");
    double sum = 0.0;
    for(int i = 0; i < outerLoopBound; ++i) {
        int dimension = 999;
        Pixel p(255, 0, 0);

        std::vector<Pixel> pixels(dimension*dimension, p);
        sum += pixels[0].b;
    }
    return sum;
}

double UseVectorPushBack() {
    TestTimer t("UseVectorPushBack");

    double sum = 0.0;
    for(int i = 0; i < outerLoopBound; ++i) {
        int dimension = 999;

        std::vector<Pixel> pixels;
        pixels.reserve(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
            pixels.push_back(Pixel(255, 0, 0));

        sum += pixels[0].b;
    }
    return sum;
}

void UseVectorEmplaceBack() {
    TestTimer t("UseVectorPushBack");

    for(int i = 0; i < outerLoopBound; ++i) {
        int dimension = 999;

        std::vector<Pixel> pixels;
        pixels.reserve(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
            pixels.emplace_back(Pixel(255, 0, 0));
    }
}

double UseArray() {
    TestTimer t("UseArray");

    double sum = 0.0;
    for(int i = 0; i < outerLoopBound; ++i) {
        int dimension = 999;

        Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);

        for(int i = 0 ; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }

        sum += pixels[0].b;
        free(pixels);
    }
    return sum;
}

int main()
{
    TestTimer t1("The whole thing");

    double result = 0.0;
    result += UseArray();
    result += UseVector();
    result += UseVector2();
    result += UseVector3();
    result += UseVectorPushBack();
    std::cout << "Result is: " << result << '\n';

    return 0;
}

I've basically modified a bit the original code in the hope to avoid the compiler from nullifying everything. So we have:

  • UseVector: Creates empty vector, uses resize, loops and sets all Pixels.
  • UseVector2: Creates vector directly of the size needed, and instantiates Pixels from temporary.
  • UseVector3: Creates vector directly of the size needed, and instantiates Pixels from a single lvalue.
  • UseVectorPushBack: Creates empty vector, uses reserve, adds Pixels with push_back.
  • UseVectorEmplaceBack: Creates empty vector, uses reserve, adds Pixels with emplace_back.
  • UseArray: mallocs an array, loops setting all Pixels, deallocates.

In addition all these functions accumulate values in a sum variable which is returned so to prevent the compiler from eliminating the loops. In the main function I test all of these functions but UseVectorEmplaceBack. This is important for later.

So I compile with the following flags: g++ -O3 -march=native -std=c++11 main.cpp. I'm on an FX-8350.

Round One

As is, the code produces for me:

UseArray completed in 0.248 seconds
UseVector completed in 0.245 seconds
UseVector2 completed in 0.872 seconds
UseVector3 completed in 0.981 seconds
UseVectorPushBack completed in 4.779 seconds
Result is: 0
The whole thing completed in 7.126 seconds

So first thing I notice is that UseVector, which in the original question was slower, is now on par as a C array, even though in theory it should be doing two passes on the data.

On the other hand, UseVector2 and UseVector3 are 4 times as slow as UseVector. This seems strange to me, why would this happen?.

Round Two

Ok, so we have a UseVectorEmplaceBack function, but we're not really testing it. Why not comment it? So we comment it, and we try again the code:

UseArray completed in 0.246 seconds
UseVector completed in 0.245 seconds
UseVector2 completed in 0.984 seconds
UseVector3 completed in 0.8 seconds
UseVectorPushBack completed in 4.771 seconds
Result is: 0
The whole thing completed in 7.047 seconds

Ok, so apparently while before UseVector2 was a bit faster than UseVector3, now the situation has reversed. This result keeps happening even after running the code multiple times. So we changed the running time of two functions by commenting an unused function. Wait what?

Round Three

Continuing from here, at this point we think that UseVector3 for some reason is the faster one. We want to make it even faster, and to do so we comment the following line in UseVector3 to reduce its workload:

// sum += pixels[0].b;

Now the function will be faster, thanks to our incredible coding ability! Let's test it:

UseArray completed in 0.245 seconds
UseVector completed in 0.244 seconds
UseVector2 completed in 0.81 seconds
UseVector3 completed in 0.867 seconds
UseVectorPushBack completed in 4.778 seconds
Result is: 0
The whole thing completed in 6.946 seconds

Ok so we removed an operation from UseVector3, which slowed down, while UseVector2, untouched, became faster than the other.

Conclusions

In addition to this there are a bunch of other weird behaviours which are too numerous to mention. It seems that every random edit on this code produces weird effects overall. For now I'm just mainly curious about the three things which I showed here:

  • Why UseVector is faster than both UseVector2 and UseVector3?
  • Why commenting an unused function alters the time of two other functions, but not the rest?
  • Why removing an operation from a function slows it down, but accelerates another?
Community
  • 1
  • 1
Svalorzen
  • 5,353
  • 3
  • 30
  • 54
  • 5
    One question per question please – Lightness Races in Orbit Oct 26 '14 at 01:51
  • Just for fun, try using LTO: https://gcc.gnu.org/wiki/LinkTimeOptimization – John Zwinck Oct 26 '14 at 01:55
  • @MattMcNabb: is that really UB? What's the basis for that? I'm not saying it's not, but I am surprised. – John Zwinck Oct 26 '14 at 01:56
  • @JohnZwinck I thought the presence of the user-defined constructor made it non-POD but maybe not, I always forget the details surrounding that – M.M Oct 26 '14 at 01:58
  • Are you really sure the compiler is not smart enough to figure out that `0.0 + 0 + 0 + ... + 0 == 0` ? – M.M Oct 26 '14 at 02:01
  • 3
    `UseVector2` and `UseVector3` [produce exactly the same assembly](http://goo.gl/oI6tl2) so either there are effects you aren't accounting for, or the margin of error is larger than you think. – M.M Oct 26 '14 at 02:08
  • You're really going to need to inspect whatever assembly your system produces, to progress on this. For example on my system I get `22.889` for `UseVector`, and `13.`x for both `UseVector2` and `UseVector3`. Your system is clearly doing some good optimization on `UseVector`. – M.M Oct 26 '14 at 02:14
  • 1
    Congrats, you avoided sin 1: failing to optimize. You have failed at sin 2: failure to repeat and measure variance. You have failed at sin 3: failure to compare apples to apples. Always run a test by itself, never after/before other tests you are comparing it to, because the previous test could impact the test's performance. You have failed at sin 4: you included code you did not use for your test (emplace stuff) and sin 5: and have at least some typos (the strings printed by your emplace stuff), I don't know about logic errors. Oh, and `Pixel(){}` does not initialize memory. – Yakk - Adam Nevraumont Oct 26 '14 at 02:24
  • 1
    @Yakk I've repeated the tests multiple times, and the results always came out the same, otherwise I wouldn't have been impressed. I've made the changes I show in any order, compiled to separate binaries and all that, and I could see the difference every time. About the including code I did not use, it's because I was testing it, but it was not interesting. Thus I commented it, and I discover that it was influencing the rest anyway - that's why it is there. Also I don't see any typos, this is exactly the code I'm using, copy-pasted. – Svalorzen Oct 26 '14 at 02:29
  • @MattMcNabb, You're right. A user-defined default constructor makes it non-trivial, meaning the class is not trivial, meaning it's not POD. – chris Oct 26 '14 at 02:31
  • @MattMcNabb I get similar results under VC2013, `UseVector2` and `3` produce same asm and runtimes. Commenting `UseVectorEmplaceBack` changes nothing, commenting `sum +=...` changes the asm, but run times are marginally affected, variance increases but both versions are still extremely close to each other. – user2802841 Oct 26 '14 at 02:46
  • @MattMcNabb Yes I get same assembly too. I've tried then to swap the calls between `UseVector2` and `UseVector3`; it seems that the second being called is consistently slower by about a tenth of a second. – Svalorzen Oct 26 '14 at 02:48
  • While when I decomment `UseVectorEmplaceBack`, it's the first which is consistently slower. Every single time I call the output, they differ. – Svalorzen Oct 26 '14 at 02:50

0 Answers0