I'm trying to make some piece of code run more faster. It is floating point intensive code -- taking as input:
- parameters (constant, double, int)
- array of input values (constant, double)
Output is
- array of values (double)
- jacobian matrix
Currently I'm using
g++-7 (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
and the following command line:
g++-7 -S -fPIC -O3 -DNDEBUG -funroll-loops -march=native -ffast-math \
-I $BOOST_DIR tmp.cpp -std=c++17 \
-D__forceinline='__attribute__((always_inline))' \
-frecord-gcc-switches -Wno-attributes
From my memory the G++ compiler produced better code in the past -- and also was chewing on such code much longer. I've tried to play with various options, but only
--param max-gcse-memory=1
seems to have any effect -- between using or not using this argument. Changes of the parameter value are ignored.
My criteria for better code is the amount of vmov/mov instruction in the code compared to vmul[sp]d instructions. Better code should contain fewer [v]mov instructions.
When using
--param max-gcse-memory=1
I'm getting 10766 [v]mov instructions compared to 11325 without this parameter. This compares to 1000 vmulpd and 1900 vmulsd -- the number being more or less constant between both tries.
Again -- I don't mind the compile time. I would like to get better code and from what I remember in the past (2010) I've got better code including much longer compile time.