Since the default is unaligned, Eigen will assume no particular alignment. So the code will work fine. The cost of doing so will vary depending on your platform:
- On old SSE2 hardware, unaligned memory accesses were very slow
- On AVX or AVX2 hardware, they are generally fast
- On AVX hardware when compiling only with SSE2-4 instructions for compatibility, you cannot fold the memory operation into the computation, which may have a slight effect on performance, especially front-end (more instructions for the same amount of micro-ops)
- On AVX-512 hardware using the full 64 byte vector size, aligned accesses become more important again since it can fetch a single cache-line in one instruction, if properly aligned
See for example this answer for a more complete discussion: Alignment and SSE strange behaviour and here for AVX-512: Why is transforming an array using AVX-512 instructions significantly slower when transforming it in batches of 8 compared to 7 or 9?
If you want to provide proper alignment, you can follow Eigen's guide on that. However, you have to make some adjustments, since your array needs the alignment and is the last member while the alignment of the Map object itself does not matter.
Here is a version that should work in C++17 and up:
template<int size>
class TestEigenMapClass
{
public:
TestEigenMapClass():
vec_(vec_raw_,size)
{
vec_.setZero();
}
Eigen::VectorXf::AlignedMapType vec_;
private:
int size_ = size;
struct alignas(EIGEN_DEFAULT_ALIGN_BYTES) {
float vec_raw_[size];
};
};
Side-note: Not sure why you save the size as an integer member when both the Map
and the template know that size. Also, I would create the map on-demand to save memory like this:
template<int size_>
class TestEigenMapClass
{
public:
using map_type = Eigen::VectorXf::AlignedMapType;
using const_map_type = Eigen::VectorXf::ConstAlignedMapType;
TestEigenMapClass() = default;
map_type vec() noexcept
{ return map_type(vec_raw_, size_); }
const_map_type vec() const noexcept
{ return const_map_type(vec_raw_, size_); }
int size() const noexcept
{ return size_; }
private:
struct alignas(EIGEN_DEFAULT_ALIGN_BYTES) {
float vec_raw_[size];
};
};
Also note that you can simply put the alignas
on the whole object if the array is the first member. That would also save space on padding bytes within the object.
I also assume you have a good reason not to simply use a fixed-size Eigen type: Eigen::Matrix<float, size, 1>
Q&A
The issue with that is (I think) that creating the map will invoke memory allocation (and some extra instructions) each time the array is accessed via vec()
No. The Map
is just a struct with a pointer and a size. It's construction is inlined. This will have zero overhead. Consider this sample code:
void foo(TestEigenMapClass<16>& out,
const TestEigenMapClass<16>& a,
const TestEigenMapClass<16>& b)
{
out.vec() = a.vec() + b.vec();
out.vec() += b.vec() * 2.f;
}
Compiled with GCC-11.3, -std=c++20 -O2 -DNDEBUG
it results in this assembly:
foo(TestEigenMapClass<16>&, TestEigenMapClass<16> const&, TestEigenMapClass<16>&):
xor eax, eax
.L2:
movaps xmm0, XMMWORD PTR [rdx+rax*4]
addps xmm0, XMMWORD PTR [rsi+rax*4]
movaps XMMWORD PTR [rdi+rax*4], xmm0
add rax, 4
cmp rax, 16
jne .L2
xor eax, eax
.L3:
movaps xmm0, XMMWORD PTR [rdx+rax*4]
addps xmm0, xmm0
addps xmm0, XMMWORD PTR [rdi+rax*4]
movaps XMMWORD PTR [rdi+rax*4], xmm0
add rax, 4
cmp rax, 16
jne .L3
ret
As you see, zero overhead. Just loading, computing, and storing of float vectors in two loops. Note that for tis to work, you have to compile with -DNDEBUG
. Otherwise Eigen will create an assertion and check the alignment at runtime when you use aligned maps. That is the only time an aligned Map
may have overhead compared to an unaligned Map
. But even then it should not matter for performance. Compilers and CPUs are good at jumping over a few simple checks.
If anything, storing the map has higher overhead since you have to materialize the object in memory between function calls and have to read one more pointer indirection (first read the Map
through its reference, then the floats through the Map
). It will also make aliasing analysis harder for the compiler.
Will EIGEN_DEFAULT_ALIGN_BYTES
lead the compiler to automatically choose the best alignment?
That is a macro set by Eigen. It is chosen depending on the architecture. If you compile for SSE2-4, it is 16, for AVX it is 32. Not sure if AVX-512 will bump this to 64. It's the alignment for that particular architecture.
Be careful with your struct layout. Something like struct { int size; struct alignas(32) { float arr[]; }; };
would waste 28 bytes for padding between the int and the floats. As usual, put the element with the largest alignment first or otherwise take care to not waste space on padding.
I could perhaps just have a fixed size type and use head()
each time. I wonder about the differences in performance (this is for a real-time, high-performance prototype).
head
, tail
, segment
etc. are all basically implemented the same as a Map
so they have the same non-existent overhead. head()
also still carries the compile time information that the vector is properly aligned. If you compile without -DNDEBUG
, there will be a range check. But again, even if you keep this activated, it is normally nothing to worry about.
Make sure to use the fixed template parameter instead of the runtime size parameter for these functions, if you can. vector.head<3>()
is more efficient than vector.head(3)
.
If you plan to resize the vector, you can also use the MaxRows
template parameter to create a vector that never allocates memory but can change its size within a specific range:
template<int size>
using VariableVector = Eigen::Matrix<
float, Eigen::Dynamic /*rows*/, 1 /*cols*/,
Eigen::ColMajor | Eigen::AutoAlign,
size /*max rows*/, 1 /*max cols*/>;