11

Is there an obvious reason why the following code segfaults ?

#include <vector>
#include <emmintrin.h>

struct point {
    __m128i v;

  point() {
    v = _mm_setr_epi32(0, 0, 0, 0);
  }
};

int main(int argc, char *argv[])
{
  std::vector<point> a(3);
}

Thanks

Edit: I'm using g++ 4.5.0 on linux/i686, I might not know what I'm doing here, but since even the following segfaults

int main(int argc, char *argv[])
{
  point *p = new point();
}

I really think it must be and alignment issue.

andreabedini
  • 1,295
  • 1
  • 13
  • 20
  • 3
    The obvious thing that could have gone wrong would be if `v` wasn't aligned properly. But it's allocated dynamically by `vector`, so it isn't subject to stack misalignment issues. Still, I would print out the address of `v` before trying to assign it, just to make sure. – Ben Voigt Mar 07 '11 at 05:21
  • 1
    BTW, it would be good to mention what compiler, version, and platform you're using when you see the segfault. I would try to reproduce your error, but without knowing specifics I can't. – Ben Voigt Mar 07 '11 at 05:23
  • Please see question: [operator new overloading and alignment](http://stackoverflow.com/questions/2366879/operator-new-overloading-and-alignment) – rwong Mar 07 '11 at 05:36
  • Also related: [Visual C++: Forcing memory alignment of variables/data-structures](http://stackoverflow.com/questions/2689239/visual-cforcing-memory-alignment-of-variables-data-structures) – Ben Voigt Mar 07 '11 at 05:46
  • 1
    32-bit heap allocators don't align better than 8. You need to use the Allocator type argument of vector<> and use an allocator that can align at 16. Check your CRT for one or cook your own by over-allocating. – Hans Passant Mar 07 '11 at 06:02
  • @Hans: doesn't help. An object of type `point` is passed by value to `std::vector::vector`, as pointed out by @phooji. And that's stack aligned, unaffected by the vector's `Allocator`. – Ben Voigt Mar 07 '11 at 06:03
  • @Ben: not a problem, only SSE instructions care about alignment. The vector<> class doesn't have any. – Hans Passant Mar 07 '11 at 06:12
  • Can anyone test [MacSTL](http://www.pixelglow.com/macstl/) with your compiler? – rwong Mar 07 '11 at 06:13
  • Related: http://stackoverflow.com/questions/4445509/c-memory-alignment-question – rwong Mar 07 '11 at 06:19
  • @Hans: `vector` class calls `point::point(const point&)` repeatedly, which will call `__m128i`'s copy constructor. That's potentially a problem depending on what code the compiler generates for copying `__m128i` instances. – Ben Voigt Mar 07 '11 at 07:34

4 Answers4

11

The obvious thing that could have gone wrong would be if v wasn't aligned properly.

But it's allocated dynamically by vector, so it isn't subject to stack misalignment issues.

However, as phooji correctly points out, a "template" or "prototype" value is passed to the std::vector constructor which will be copied to all the elements of the vector. It's this parameter of std::vector::vector that will be placed on the stack and may be misaligned.

Some compilers have a pragma for controlling stack alignment within a function (basically, the compiler wastes some extra space as needed to get all locals properly aligned).

According to the Microsoft documentation, Visual C++ 2010 should set up 8 byte stack alignment automatically for SSE types and has done so since Visual C++ 2003

For gcc I don't know.


Under C++0x, for new point() to return unaligned storage is a serious non-compliance. [basic.stc.dynamic.allocation] says (wording from draft n3225):

The allocation function attempts to allocate the requested amount of storage. If it is successful, it shall return the address of the start of a block of storage whose length in bytes shall be at least as large as the requested size. There are no constraints on the contents of the allocated storage on return from the allocation function. The order, contiguity, and initial value of storage allocated by successive calls to an allocation function are unspecified. The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type with a fundamental alignment requirement (3.11) and then used to access the object or array in the storage allocated (until the storage is explicitly deallocated by a call to a corresponding deallocation function).

And [basic.align] says:

Additionally, a request for runtime allocation of dynamic storage for which the requested alignment cannot be honored shall be treated as an allocation failure.

Can you try a newer version of gcc where this might be fixed?

Community
  • 1
  • 1
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • Thanks for your comment, indeed gcc 4.6 C++0x support [status](http://gcc.gnu.org/gcc-4.6/cxx0x_status.html) reports alignment support (N2341) as not yet implemented. – andreabedini Mar 07 '11 at 06:42
  • Odd that this is the accepted answer, because it's wrong... The problem is not stack-alignment; it's heap alignment. He isn't calling `new point` (which of course would be fine); the `std::vector` class is calling `::operator new` with some size, and the returned block of memory is only 8-byte aligned on 32-bit GCC (even with GCC 7.3, the latest I have tried). – Nemo Jan 23 '19 at 21:39
  • @Nemo: You haven't ruled out stack alignment issues. On any particular system, you may need to check both, and either one could be going wrong. You say you tested `::operator new()` on i686 gcc, and got 8 byte alignment, but what parameters did you pass? – Ben Voigt Jan 23 '19 at 22:39
  • I did not test `::operator new`. I was trying to use push_back` on a vector of `__m128i`. (More precisely: A vector of structs containing a single member of this type.) I disassembled at the segfault to find an aligned store to an 8-byte aligned heap address. Misalignment of these types on the stack is incredibly unlikely; it would be a serious compiler bug, never mind that these types are passed in SSE registers. This problem is simply malloc/new having different alignment guarantees on 32-bit and 64-bit Linux platforms, and GCC using aligned loads/stores for moving these types around. – Nemo Jan 24 '19 at 01:26
  • @Nemo: Misalignment on the heap is a serious toolchain bug -- I quoted the Standard rule that applies. So you can't conclude that one or the other is more likely based on that. – Ben Voigt Jan 24 '19 at 05:57
  • As far as I know, the latest GCC/glibc on 32-bit platforms still only guarantees 8-byte alignment for memory blocks returned by malloc/new. These 16-byte SSE types are not standard, which is presumably what makes this compliant behavior. (Otherwise this is 15+ year old bug.) – Nemo Jan 24 '19 at 19:07
3

The vector constructor you are using is actually defined like this:

explicit vector ( size_type n, const T& value= T(), const Allocator& = Allocator() );

(see e.g., http://www.cplusplus.com/reference/stl/vector/vector/).

In other words, one element is default constructed (i.e., the default parameter value as you call the constructor), and the remaining elements are then created by copying the first one. My guess is that you need a copy constructor for point that properly handles the (non-)copying of __m128i values.

Update: When I try to build your code with Visual Studio 2010 (v. 10.0.30319.1), I get the following build error:

error C2719: '_Val': formal parameter with __declspec(align('16')) won't be aligned c:\program files\microsoft visual studio 10.0\vc\include\vector 870 1   meh

This suggests Ben is right on the money regarding this being an alignment problem.

phooji
  • 10,086
  • 2
  • 38
  • 45
  • 2
    `__m128i` should be blittable, I don't think this is the problem. And even if it wasn't, the automatically-generated copy constructor for `point` would call the correct copy constructor for `__m128i`. – Ben Voigt Mar 07 '11 at 05:24
  • @Ben: Looks like you might be right on the money re: alignment. Updating. – phooji Mar 07 '11 at 05:30
  • @Ben Voigt: Possibly -- that error points to one of vector's `resize` overloads. – phooji Mar 07 '11 at 05:42
  • @Ben: it is because the `new` operator, as defined in the standard library, does not take into consideration of the alignment directives, as illustrated in @phooji's VS2010 error message. – rwong Mar 07 '11 at 05:45
  • @rwong: The standard demands that allocation functions return aligned memory. – Ben Voigt Mar 07 '11 at 05:47
  • @Ben: yes, you're right. Seems like it is a bug in VS2010, and a silent failure in earlier versions. (3.7.3.1 Allocation functions) The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type and then used to access the object or array in the storage allocated. – rwong Mar 07 '11 at 05:56
  • @rwong: On Visual C++ 2010, the dynamically allocated memory is properly aligned (I believe, haven't tested). The error message @phooji is getting is related to parameters, which don't use dynamic allocation, but could be affected by stack misalignment. – Ben Voigt Mar 07 '11 at 06:00
  • "The standard demands that allocation functions return aligned memory" - but how much aligned? In Linux I get 16-byte alignment, in Visual something smaller. It won't suffice if somebody "invents" a type with 1024-byte alignment. In VS there is [_aligned_malloc](http://msdn.microsoft.com/en-us/library/8z34s9c6%28v=VS.100%29.aspx) – CygnusX1 Mar 07 '11 at 07:04
  • @Cygnus: I quoted the paragraph in my answer (it doesn't fit well in a comment). – Ben Voigt Mar 07 '11 at 07:29
1

There is a possibility that the memory that is allocated by the default allocator in your compiler's STL implementation is not aligned. This will be dependent on the specific platform and compiler vendor.

Usually the default allocator uses operator new, which usually does not guarantee alignment beyond the word size (32-bit or 64-bit). To solve the problem, it may be necessary to implement a custom allocator which uses _aligned_malloc.

Also, a simple fix (although not a satisfactory one) would be to assign the value to a local __m128i variable, then copy that variable to the struct using unaligned instruction. Example:

struct point {
    __m128i v;
    point() {
        __m128i temp = _mm_setr_epi32(0, 0, 0, 0);
        _mm_storeu_si128(&v, temp);
    }
};
rwong
  • 6,062
  • 1
  • 23
  • 51
1

SSE intrinsics are required to be 16-byte aligned in memory. When you allocate an __m128 on the stack, there's no problem because the compiler automatically aligns these correctly. The default allocator for std::vector<>, which handles dynamic memory allocation, does not produce aligned allocations.

Inverse
  • 4,408
  • 2
  • 26
  • 35