4

I'm looking for the fastest way to do simple operations using Eigen. There are so many datastructures available, its hard to tell which is the fastest.

I've tried to predefine my data structures, but even then my code is being outperformed by similar Fortran code. I've guessed Eigen::Vector3d is the fastest for my needs, (since its predefined), but I could easily be wrong. Using -O3 optimization during compile time gave me a big boost, but I'm still running 4x slower than a Fortran implementation of the same code.

I make use of an 'Atom' structure, which is then stored in an 'atoms' vector defined by the following:

struct Atom {
    std::string element;
    //double x, y, z;
    Eigen::Vector3d coordinate;
};
std::vector<Atom> atoms;

The slowest part of my code is the following:

distance = atoms[i].coordinate - atoms[j].coordinate;
distance_norm = distance.norm();

Is there a faster data structure I could use? Or is there a faster way to perform these basic operations?

Daniel Marchand
  • 584
  • 8
  • 26
  • 3
    You could try compiling with `-ffast-math` (if you are using gcc or clang). And you can try `Eigen::AlignedVector3` from `#include `. Also make sure to compile with `-DNDEBUG` (once you verified that your code works correctly). But answering what "the fastest" is, requires more context. – chtz Jun 11 '19 at 16:12
  • -ffast-math gives me 40x speedup over -O3! – Daniel Marchand Jun 11 '19 at 18:15
  • 1
    Interestingly I get the 40x speedup with the -fno-math-errno option (just one of several enabled by --fast-math) alone. – Daniel Marchand Jun 11 '19 at 18:56
  • For other c++ newbies like me. I got an order-of-magnitude speedup by removing calls to 'pow'. e.g. pow(x,2) --> x_2 = x*x, pow(x,6) = x_2*x_2*x_2 – Daniel Marchand Jun 12 '19 at 14:19

3 Answers3

4

As you pointed out in your comment, adding the -fno-math-errno compiler flag gives you a huge increase in speed. As to why that happens, your code snipped shows that you're doing a sqrt via distance_norm = distance.norm();.

This makes the compiler not set ERRNO after each sqrt (that's a saved write to a thread local variable), which is faster and enables vectorization of any loop that is doing this repeatedly.The only disadvantage to this is that the IEEE adherence is lost. See gcc man.

Another thing you might want to try is adding -march=native and adding -mfma if -march=native doesn't turn it on for you (I seem to remember that in some cases it wasn't turned on by native and had to be turned on by hand - check here for details). And as always with Eigen, you can disable bounds checking with -DNDEBUG.

SoA instead of AoS!!! If performance is actually a real problem, consider using a single 4xN matrix to store the positions (and have Atom keep the column index instead of the Eigen::Vector3d). It shouldn't matter too much in the small code snippet you showed, but depending on the rest of your code, may give you another huge increase in performance.

Avi Ginsburg
  • 10,323
  • 3
  • 29
  • 56
  • Related (regarding `std::sqrt`): https://stackoverflow.com/questions/43303090/why-does-gcc-call-libcs-sqrt-without-using-its-result/ – chtz Jun 12 '19 at 14:57
0

Given you are ~4x off, it might be worth checking that you have enabled vectorization such as AVX or AVX2 at compile time. There are of course also SSE2 (~2x) and AVX512 (~8x) when dealing with doubles.

keith
  • 5,122
  • 3
  • 21
  • 50
0

Either try another compiler like Intel C++ compiler (free for academic and non-profit usage) or use other libraries like Intel MKL (far faster that your own code) or even other BLAS/LAPACK implementations for dense matrices or PARDISO or SuperLU (not sure if still exists) for sparse matrices.

Dan
  • 126
  • 1
  • 14