17

I've been working on a Genetic Algorithm which I'd previously been compiling using g++ 4.8.1 with the arguments

CCFLAGS=-c -Wall  -Ofast -fopenmp -mfpmath=sse -march=native -std=gnu++11 

I wasn't using many of the features of c++11 and have a reasonable profiling system so I replaced literally 3-4 lines of code and had it compile without -std=gnu++11

CCFLAGS=-c -Wall  -Ofast -fopenmp -mfpmath=sse -march=native

When I ran my profiler again, I noticed that I could see ~5% performance improvement almost everywhere, except for my sort function, which was now taking about twice as long. (It's an overloaded operator< on the object)

My questions are:

What performance differences are known between the two versions, and is it expected that c++11 would be faster in newer compilers?

I'm also expecting the fact I'm using -Ofast is playing a role, am I right in my assumption?

UPDATE:

As suggested in comments I ran the tests again using with and without -march=native

// Fast sort, slightly slower in other tests
CCFLAGS=-c -Wall  -Ofast -fopenmp -mfpmath=sse -march=native -std=gnu++11  

// Fast sort, slower in other tests
CCFLAGS=-c -Wall  -Ofast -fopenmp -mfpmath=sse -std=gnu++11  

// Slow sort, slower in other tests
CCFLAGS=-c -Wall  -Ofast -fopenmp -mfpmath=sse                     

// Slow sort, fastest in other tests
CCFLAGS=-c -Wall  -Ofast -fopenmp -mfpmath=sse  -march=native

The conclusion seems to be the same that -std=gnu++11 speeds up sort drastically with a slight penalty almost everywhere else. -march=native speeds up program whenever used.

Given that sort is only called once per generation, I'll take the speed benefit of not compiling with -std=gnu++11, but I'm still very interested in what is causing these results.

I'm using the // std::sort provided from #include

joeButler
  • 1,643
  • 1
  • 20
  • 41
  • 1
    As for the performance improvements: move semantics in the standard library containers and algorithms. However, it would be nice to see the code involved in the sort function, the 2x slowdown is suspicious. By the way: `-std=gnu++11` also enables GNU extensions; you probably want `-std=c++11` which doesn't. – Ali Feb 26 '14 at 20:38
  • 1
    Possibly relevant: -Ofast enables optimisations that are not valid according to the C++ standard. – R. Martinho Fernandes Feb 26 '14 at 20:43
  • @R.MartinhoFernandes Yes, but it is used in both cases, so the difference comes from C++11 features or some GNU extension. However, without seeing the actual code, we cannot know why the sort is 2x slower. – Ali Feb 26 '14 at 21:13
  • @Ali True. I mentioned it because I wonder how it interacts with the flag that sets a certain level of conformance (`-std=`). – R. Martinho Fernandes Feb 26 '14 at 21:42
  • @R.MartinhoFernandes Yes. And I am not sure how `-Ofast` and `-mfpmath=sse` interact. After reading [Enabling strict floating point mode in GCC](http://stackoverflow.com/q/7295861/341970), I would think that `-mfpmath=sse` enables strict fp mode, however `-Ofast` is in conflict with it. I have no idea how that one is resolved. In any case, we would have to see the code to be able to say more. – Ali Feb 26 '14 at 22:08
  • Please post an [SSCCE](http://sscce.org/) that shows the 2x slowdown in your sorting algorithm. – Ali Feb 26 '14 at 22:10
  • It's interesting that how the rest of the code is faster when compiled without gnu++11 flag. Which means gnu++11 mode is slowing down the code ? – Jagannath Feb 26 '14 at 23:30
  • Yeah this was what really got me interested. I'll be able to post the code shortly containing the search. Its just an operator< with a comparison of a double the object contains – joeButler Feb 26 '14 at 23:35
  • @joeButler, can you state the machine you are running on (processor type)? Does it have AVX? – Z boson Feb 27 '14 at 09:08
  • @Zboson - Its an i7-3770 (x86_64). It does have avx support. Are you thinking this is not well optimised yet for -std=gnu++11? – joeButler Feb 27 '14 at 09:55
  • @joeButler, can you remove the `march=native` and see what happens? Is the sorting still twice as slow? To do this right you will have to compare both version without `march=native`. – Z boson Feb 27 '14 at 10:00
  • @Zboson - Added updated settings, seems march is not related to the sort, but does have positive effect on speed. – joeButler Feb 27 '14 at 17:35
  • @joeButler, glad you checked this. That eliminates one of my guesses as to the cause. – Z boson Feb 27 '14 at 18:33
  • @joeButler, are you using a custom sort function or one from a library? – Z boson Feb 27 '14 at 18:35
  • 1
    I'm wondering if you see the same difference with `-std=c++11`? – KillianDS Feb 27 '14 at 22:07
  • Why all the guessing? Take some [*stackshots*](http://stackoverflow.com/a/378024/23771). See for yourself where the time goes. Then if you want better insight, examine or step through the assembly code. Performance is not a big mystery. – Mike Dunlavey Jul 17 '14 at 12:19
  • Voting to close as too broad: there are too many possible differences. Do some profiling, decompile the hotspots, and make a minimal example that shows the problem. Then we can help, and maybe improve GCC :-) – Ciro Santilli OurBigBook.com Jun 02 '15 at 16:40

4 Answers4

1

I am not certain why using --std=gnu++11 would make parts of the code slower. I do not use that personally (instead, I use --std=c++11). Perhaps the extra GNU features are slowing something down? More likely, the optimization hasn't caught up with the new language features yet.

As for why the sort part is faster, I have a plausible explanation:

You've enabled move semantics. Even if you don't explicitly write them yourself, if your classes are reasonably constructed, they will be generated. The "sort" algorithm probably takes advantage of them.

However, the class you've listed above doesn't seem to have much storage. However, it does not have a "swap" method, so without C++11 move semantics, the sort routine must do more work. You might look at this question and answers for more about sort and move semantics and interactions with compiler options.

Community
  • 1
  • 1
cshelton
  • 360
  • 3
  • 8
0

There has been a lot of interest in why the sort method had such a performance drop.

I'm more interested in why the remaining code saw a good improvement, but to help conversation, below is the only part of my code which was quicker under -std=gnu++11

Its just the comparison of a double on a vector objects member.

class TvectorPM {
public:
    pthread_mutex_t lock;
    std::vector<PopulationMember> v; 
    void add(PopulationMember p);
};

void TvectorPM::add(PopulationMember p) {
    pthread_mutex_lock(&lock);
    v.push_back(p);
    pthread_mutex_unlock(&lock);
}


class PopulationManager {
public:
    TvectorPM populationlist;
}


void PopulationManager::sortByScore() {
    // Have overloaded operator< to make this fast
    sort(populationlist.v.begin(),populationlist.v.end());
}


class PopulationMember {
public:
    bool hasChanged;
    double score;

    inline bool operator< (const PopulationMember& rhs) const{
        return this->score < rhs.score;
    }
Mike G
  • 4,232
  • 9
  • 40
  • 66
joeButler
  • 1,643
  • 1
  • 20
  • 41
  • I'd guess the compiler is able to improve for this situation, so if I could work out why, it would be nice to bring this back into an older version of g++ – joeButler Feb 27 '14 at 01:29
  • Have you tried with --std=c++11 instead of gnu++11? It might help resolve what's causing the difference. – cshelton Aug 04 '14 at 22:37
0

I believe this boils down to the features GNU adds(documentation on GNU Extensions).

Those extensions might optimize some functionality in rather reasonable manner and provide additional overhead for others, as the performance depends on the shape of the code.

Unfortunately I'm unable to provide specifics.

-5

C++11 differs from the old versions in number of aspects. Many enhancements have been made to the language's original core too.
Also, some additional features are added. You can visit this webpage and look at the items with the C++11 tag.
Some of the minor, yet heavily used features -

1. initializer list for `vectors`<br>
2. range based `for` loop<br>
3. the `auto` keyword, for declaring data types, <br>
4. the 'uniform initialization syntax', in its full glory

and also the -std=c++11 flag that must be used to be able to enjoy any of the above features.

As for the performance issues, it may have been just a coincidence. But to be sure, run the compilations multiple times.

zhirzh
  • 3,273
  • 3
  • 25
  • 30