0

I use Eigen::Array in my program. Depending on array size compiling with -march=native on GCC doesn't have any effect or actually slower. I'm not realy experienced with Eigen or C++, but I thought -march=native should enable simd vectorization which in turn should lead to faster execution. Am I wrong? And where should I look for problem?

If I strip down my code the core of it would look something like this:

using Matrix = Eigen::Array<Float, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;

Matrix swarm_matrix;
Matrix pbest_matrix;
Matrix velocities_matrix;

// Initialize matrices and do some stuff

for(size_t iteration = 0; iteration < iterations_number; ++iteration)
{
    for(size_t particle_index = 0; particle_index < particles_number; ++particle_index)
    {
        velocities_matrix.row(particle_index) =
            w * velocities_matrix.row(particle_index) +
            c1 * (pbest_matrix.row(particle_index) - swarm_matrix.row(particle_index)) +
            c2 * (pbest_matrix.row(gbest_index)    - swarm_matrix.row(particle_index));

        swarm_matrix.row(particle_index) =
            swarm_matrix.row(particle_index) + 
            velocities_matrix.row(particle_index);
 
        // Do some other stuff
    }
}
Aisec Nory
  • 385
  • 1
  • 8
  • You don't necessarily need `-march=native` for vectorization to take place. Often times just using `-O3` will do that. – NathanOliver Sep 30 '21 at 21:17
  • What other compilation options are you using? What version of gcc? What target architecture, and what actual CPU? – Nate Eldredge Sep 30 '21 at 21:25
  • @NateEldredge Compilation options are -O2, -D NDEBUG, gcc version is 8.1.0, architecture is x86_64-w64-mingw32, CPU i3-4130 – Aisec Nory Sep 30 '21 at 21:30
  • @Zoidberg use `-O3` instead because `-O2` doesn't enable autovectorization, or you'll need to specify autovectorize options manually – phuclv Oct 01 '21 at 00:16
  • 1
    How big is the array? Because to multiply big matrices you need to use special cache friendly algorithms or the result will be terrible especially when the size is a power of 2: [Why is there huge performance hit in 2048x2048 versus 2047x2047 array multiplication?](https://stackoverflow.com/q/6060985/995714), [Why is my program slow when looping over exactly 8192 elements?](https://stackoverflow.com/q/12264970/995714), [Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?](https://stackoverflow.com/q/11413855/995714) – phuclv Oct 01 '21 at 00:16

0 Answers0