0

In using the Eigen library, I have a templated C++ class comprising a raw buffer and an Eigen::Map instance as a member. In the constructor of the class, I initialize the map as follows:

template<int size>
class TestEigenMapClass
{
public:
    TestEigenMapClass(): 
        
        vec_(vec_raw_,size) 
    {
        vec_.setZero();
    }
    Eigen::Map<Eigen::VectorXf> vec_;
    
private:
    int size_ = size;
    float vec_raw_[size];
};

The raw buffer is allocated by the system. Do I have to worry about alignment and performance when declaring or initializing the map?

It does work as it is but I am wondering about alignment-caused performance differences when compiling this code in different platforms. In the documentation for the Eigen::Map class, it just says "MapOptions specifies whether the pointer is Aligned, or Unaligned. The default is Unaligned.", but nothing else.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
Ernest
  • 5
  • 3

1 Answers1

2

Since the default is unaligned, Eigen will assume no particular alignment. So the code will work fine. The cost of doing so will vary depending on your platform:

  • On old SSE2 hardware, unaligned memory accesses were very slow
  • On AVX or AVX2 hardware, they are generally fast
  • On AVX hardware when compiling only with SSE2-4 instructions for compatibility, you cannot fold the memory operation into the computation, which may have a slight effect on performance, especially front-end (more instructions for the same amount of micro-ops)
  • On AVX-512 hardware using the full 64 byte vector size, aligned accesses become more important again since it can fetch a single cache-line in one instruction, if properly aligned

See for example this answer for a more complete discussion: Alignment and SSE strange behaviour and here for AVX-512: Why is transforming an array using AVX-512 instructions significantly slower when transforming it in batches of 8 compared to 7 or 9?

If you want to provide proper alignment, you can follow Eigen's guide on that. However, you have to make some adjustments, since your array needs the alignment and is the last member while the alignment of the Map object itself does not matter.

Here is a version that should work in C++17 and up:

template<int size>
class TestEigenMapClass
{
public:
    TestEigenMapClass(): 
        
        vec_(vec_raw_,size) 
    {
        vec_.setZero();
    }
    Eigen::VectorXf::AlignedMapType vec_;
    
private:
    int size_ = size;
    struct alignas(EIGEN_DEFAULT_ALIGN_BYTES) {
        float vec_raw_[size];
    };
};

Side-note: Not sure why you save the size as an integer member when both the Map and the template know that size. Also, I would create the map on-demand to save memory like this:

template<int size_>
class TestEigenMapClass
{
public:
    using map_type = Eigen::VectorXf::AlignedMapType;
    using const_map_type = Eigen::VectorXf::ConstAlignedMapType;

    TestEigenMapClass() = default;
    map_type vec() noexcept
    { return map_type(vec_raw_, size_); }

    const_map_type vec() const noexcept
    { return const_map_type(vec_raw_, size_); }

    int size() const noexcept
    { return size_; }
private:
    struct alignas(EIGEN_DEFAULT_ALIGN_BYTES) {
        float vec_raw_[size];
    };
};

Also note that you can simply put the alignas on the whole object if the array is the first member. That would also save space on padding bytes within the object.

I also assume you have a good reason not to simply use a fixed-size Eigen type: Eigen::Matrix<float, size, 1>

Q&A

The issue with that is (I think) that creating the map will invoke memory allocation (and some extra instructions) each time the array is accessed via vec()

No. The Map is just a struct with a pointer and a size. It's construction is inlined. This will have zero overhead. Consider this sample code:

void foo(TestEigenMapClass<16>& out,
         const TestEigenMapClass<16>& a,
         const TestEigenMapClass<16>& b)
{
    out.vec() = a.vec() + b.vec();
    out.vec() += b.vec() * 2.f;
}

Compiled with GCC-11.3, -std=c++20 -O2 -DNDEBUG it results in this assembly:

foo(TestEigenMapClass<16>&, TestEigenMapClass<16> const&, TestEigenMapClass<16>&):
        xor     eax, eax
.L2:
        movaps  xmm0, XMMWORD PTR [rdx+rax*4]
        addps   xmm0, XMMWORD PTR [rsi+rax*4]
        movaps  XMMWORD PTR [rdi+rax*4], xmm0
        add     rax, 4
        cmp     rax, 16
        jne     .L2
        xor     eax, eax
.L3:
        movaps  xmm0, XMMWORD PTR [rdx+rax*4]
        addps   xmm0, xmm0
        addps   xmm0, XMMWORD PTR [rdi+rax*4]
        movaps  XMMWORD PTR [rdi+rax*4], xmm0
        add     rax, 4
        cmp     rax, 16
        jne     .L3
        ret

As you see, zero overhead. Just loading, computing, and storing of float vectors in two loops. Note that for tis to work, you have to compile with -DNDEBUG. Otherwise Eigen will create an assertion and check the alignment at runtime when you use aligned maps. That is the only time an aligned Map may have overhead compared to an unaligned Map. But even then it should not matter for performance. Compilers and CPUs are good at jumping over a few simple checks.

If anything, storing the map has higher overhead since you have to materialize the object in memory between function calls and have to read one more pointer indirection (first read the Map through its reference, then the floats through the Map). It will also make aliasing analysis harder for the compiler.

Will EIGEN_DEFAULT_ALIGN_BYTES lead the compiler to automatically choose the best alignment?

That is a macro set by Eigen. It is chosen depending on the architecture. If you compile for SSE2-4, it is 16, for AVX it is 32. Not sure if AVX-512 will bump this to 64. It's the alignment for that particular architecture.

Be careful with your struct layout. Something like struct { int size; struct alignas(32) { float arr[]; }; }; would waste 28 bytes for padding between the int and the floats. As usual, put the element with the largest alignment first or otherwise take care to not waste space on padding.

I could perhaps just have a fixed size type and use head() each time. I wonder about the differences in performance (this is for a real-time, high-performance prototype).

head, tail, segment etc. are all basically implemented the same as a Map so they have the same non-existent overhead. head() also still carries the compile time information that the vector is properly aligned. If you compile without -DNDEBUG, there will be a range check. But again, even if you keep this activated, it is normally nothing to worry about.

Make sure to use the fixed template parameter instead of the runtime size parameter for these functions, if you can. vector.head<3>() is more efficient than vector.head(3).

If you plan to resize the vector, you can also use the MaxRows template parameter to create a vector that never allocates memory but can change its size within a specific range:

template<int size>
using VariableVector = Eigen::Matrix<
      float, Eigen::Dynamic /*rows*/, 1 /*cols*/,
      Eigen::ColMajor | Eigen::AutoAlign,
      size /*max rows*/, 1 /*max cols*/>;
Homer512
  • 9,144
  • 2
  • 8
  • 25
  • Thanks for such a complete answer, @Homer512. I like your suggestion to create the map on the fly. The issue with that is (I think) that creating the map will invoke memory allocation (and some extra instructions) each time the array is accessed via `vec()`. Will `EIGEN_DEFAULT_ALIGN_BYTES` lead the compiler to automatically choose the best alignment? – Ernest Jan 03 '23 at 08:11
  • The reason I am not using a fixed size type is that the number of elements to be accessed may be time varying (say the size is 16; and sometimes only 8 elements are accessed, sometimes it will be 12, etc.). So I thought I could have an `Eigen::Map` as a member and change the map on the fly via placement new. But now that I think about what you said at the end of your answer, I could perhaps just have a fixed size type and use `head()` each time. I wonder about the differences in performance (this is for a real-time, high-performance prototype). – Ernest Jan 03 '23 at 08:11
  • @Ernest I expanded the answer to cover your comments – Homer512 Jan 03 '23 at 16:20
  • that is so useful! I wish the documentation included these types of explanations and examples. – Ernest Jan 03 '23 at 20:10
  • @Homer514 In expanding my test classes to other cases, I am facing an issue: In another class, I have two of those raw buffers, say `vec1_raw_` and `vec2_raw_`, each declared inside a `struct alignas(EIGEN_DEFAULT_ALIGN_BYTES) { };` (one for each). When I try to assign one to the other via `vec1() = vec2()`, the rest of variables of the class get all messed up. Am I doing something illegal? – Ernest Jan 13 '23 at 05:02
  • @Ernest No, I don't think so. Maybe there is something wrong with the size. Can you please post it in a new question so that it's not just me looking at the code? – Homer512 Jan 13 '23 at 08:08
  • I just found the problem, indeed a size issue. My bad. And thanks again, you've been super helpful! – Ernest Jan 13 '23 at 21:31