GCC -march-native slows down threaded code

Question

I made an odd observation. When I active the CXX flag -march=native the timing of my code snippets becomes twice as long.

// -> sequential
int n = (int) 1e7;
Vector<double, 32> a;
a.init(n);
for (int i = 0; i < n; i++)
    a(i) = 1.0;

double r1;
Timer::start();
psum(a.data, n, r1);
Timer::stop();

std::cout << "timing (ms): " << Timer::get_timing() << std::endl;
std::cout << r1 << std::endl;
// <-

// -> threading simple
int n_threads = 2;
Vector<double, 32> b;
b.init(n);
for (int i = 0; i < n; i++)
    b(i) = 2.0;

double r2;
Timer::start();
std::thread t1(psum, b.data + n/2, n/2, std::ref(r1));
psum(b.data, n/2, r2);
t1.join();
Timer::stop();

std::cout << "timing (ms): " << Timer::get_timing() << std::endl;
std::cout << r1 + r2 << std::endl;
// <-

Specifically, the threaded example jumps from 8 ms to 16 ms. And 16 ms is the timing of the sequential code.

Extra info:

g++ 6.2 compiler
ubuntu 16.10
intel i5-6200u (skylake)
vectors are 32-byte aligned
compile with c++ -std=c++11 -O3 -pthread ...

Any idea where this comes from?

UPDATE 1

when I activate only -mtune=skylake then the timing jumps to 32 ms.

for starters, you could print your "native" flags and compare them to the other flags, check here: http://stackoverflow.com/questions/5470257/how-to-see-which-flags-march-native-will-activate — Jean-François Fabre, Nov 03 '16 at 21:20
the exposed flags are ```-march=skylake -mmmx -msse -msse2 -msse3 -mssse3 -mcx16 -msahf -mmovbe -maes -mpclmul -mpopcnt -mabm -mfma -mbmi -mbmi2 -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mclflushopt -mxsavec -mxsaves --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=3072 -mtune=skylake -fstack-protector-strong -Wformat -Wformat-security``` — Armen Avetisyan, Nov 03 '16 at 21:27
and can you print the flags where you don't activate the march=native flag? — Jean-François Fabre, Nov 03 '16 at 21:30
yup. no use of the ```-march``` flag leads to: ```-mtune=generic -march=x86-64 -fstack-protector-strong -Wformat -Wformat-security``` — Armen Avetisyan, Nov 03 '16 at 21:32
sorry! you'll have to turn off switches to see if it has some effect. I would create a script to remote the switches one by one until a significant speed change occurs... tricky. — Jean-François Fabre, Nov 03 '16 at 21:41
when I activate only ```-mtune=skylake``` then the timing jumps to 32 ms. thank you for your help. I guess it is some bug — Armen Avetisyan, Nov 03 '16 at 21:47
I read that skylake is a relatively new architecture... well too new for the tools anyway :) glad you sorted it out. — Jean-François Fabre, Nov 03 '16 at 21:53
Might be due to the glibc bug mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611 ... — Anon, May 09 '18 at 05:25

GCC -march-native slows down threaded code

UPDATE 1

0 Answers0