3

I have some code which runs on Visual Studio (MSVC compiler) with an average speed per message of 127 CPU cycles. I ported it over to Linux (Mint 16), Netbeans 7.4 and GCC 4.8. The code is C++ standard compliant.

The only changes I had to make was to replace __rdtsc() with an inline GCC version. To run the code I changed Netbeans to Release mode, went in to the properties and changed the setting so that this was a "Performance Release". I then click on the green arrow and the program is taking on average 229 CPU cycles per message- nearly double the MSVC time.

Am I running Netbeans release mode correctly? I know on Visual Studio you have to press ctrl + f5 for a proper performance release. I werent sure if there was an equivalent mistake to make using Netbeans? I was expecting the code to be faster on Linux!

The code doesn't use any containers except for raw arrays.

Timing:

Windows I used:

unsigned long long start = __rdtsc();
//Code
unsigned long long finish = __rdtsc();

Linux, same as above except I used:

#if defined(__i386__)

static __inline__ unsigned long long rdtsc(void)
{
    unsigned long long int x;
    __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
    return x;
}

#elif defined(__x86_64__)

static __inline__ unsigned long long rdtsc(void)
{
    unsigned hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}

#endif

Which I got from this SO answer: https://stackoverflow.com/a/9887899/997112

Community
  • 1
  • 1
user997112
  • 29,025
  • 43
  • 182
  • 361
  • You may have better luck invoking GCC directly, with the highest level of optimizations enabled. See [`man g++`](http://linux.die.net/man/1/g++) and the `-O3` (highest optimization level) option. – Linuxios Apr 01 '14 at 01:11
  • The compile line contained: g++ m64 -c -O3 -STD=C++11 so I already had that optimization level? – user997112 Apr 01 '14 at 01:15
  • I assume you are running on the same machine? – Linuxios Apr 01 '14 at 01:15
  • Does g++ have a similar compiler option to CLang++, where the `-stdlib=libc++` can be used? Perhaps CLang++ would create an executable that would run faster than 4.8 GCC. – CPlusPlus OOA and D Apr 01 '14 at 01:21
  • Can you please share how you have calculated the actual time taken by your program under both platform(linux and windows). There could be something different in the way time has been measured in both cases. – Mantosh Kumar Apr 01 '14 at 01:23
  • @tmp please see modified Q :) – user997112 Apr 01 '14 at 01:29
  • According to this post (which means it must be true forever...): [Performance measurements with RDTSC](http://www.strchr.com/performance_measurements_with_rdtsc), the VS2005 compiler needs special assembly to flush the instruction pipeline first. Do the more recent Visual Studio versions accommodate for this in the `__rdtsc()` function? The same may be true for running in Linux. – CPlusPlus OOA and D Apr 01 '14 at 01:38
  • @CPlusPlusOOAandD wouldn't that suggest the VS timing is slower? – user997112 Apr 01 '14 at 01:39
  • @user997112 I am not so sure about that, it may mean that other processes' instructions are in the pipeline, it may not. If possible, I would look at the `__rdtsc()` implementation. When last using high performance counters in Windows, I used `QueryPerformanceCounter` and `QueryPerformanceFrequency`. The latest msdn page: [QueryPerformanceCounter function](http://msdn.microsoft.com/en-us/library/windows/desktop/ms644904%28v=vs.85%29.aspx) says you have to be at least Windows 2000 and they live in kernel32.lib and kernel32.dll. – CPlusPlus OOA and D Apr 01 '14 at 01:48
  • On a Linux platform, I have to do some research to find an equivalent. – CPlusPlus OOA and D Apr 01 '14 at 01:49
  • Also, what's in the `//Code` section? – Cramer Apr 01 '14 at 01:57
  • @user997112 I am not very confident this topic can be quickly answered. I have found this SO post's answer to add more details on Linux timing measurements. [Is gettimeofday() guaranteed to be of microsecond resolution?](http://stackoverflow.com/questions/88/is-gettimeofday-guaranteed-to-be-of-microsecond-resolution), and has been last edited on October 14, 2012. – CPlusPlus OOA and D Apr 01 '14 at 01:58
  • Is program locked to one CPU core? As I recall, rdtsc isn't required to be equal on different CPUs or cores. I would've started with measuring whole execution time and dividing it with iterations count somewhen much later. – keltar Apr 01 '14 at 05:45
  • @keltar Its not locked to one core (yet) but then again, it isnt on Windows either. – user997112 Apr 01 '14 at 14:53

0 Answers0