4

Possible Duplicate:
SSE, intrinsics, and alignment

I'm new to SIMD programming, so please excuse me if I'm asking an obvious question.

I was experimenting a bit and got to a point where I want to store a SIMD value in a dynamically allocated structure.

Here's the code:

struct SimdTest
{
    __m128      m_simdVal;

    void setZero()
    {
        __m128 tmp = _mm_setzero_ps(); 
        m_simdVal = tmp; // <<--- CRASH ---
    }
};

TEST( Plane, dynamicallyAllocatedPlane )
{
    SimdTest* test = new SimdTest();

    test->setZero();

    delete test;
}

When the method marked with CRASH comment is executed, the code crashes with the following exception:

Unhandled exception at 0x775315de in test-core.exe: 0xC0000005: Access violation reading location 0x00000000

Could someone please explain why does the assignment operation break, and how should SIMD-containing objects be allocated dynamically so that they work fine?

I need to add that if I statically instantiate a SimdTest object and call the setZero method, everything works fine.

Thanks, Paksas

Community
  • 1
  • 1
Piotr Trochim
  • 693
  • 5
  • 15
  • 7
    Looks like you're the newest victim of misalignment. `new` doesn't sufficiently align to 16 bytes. – Mysticial Oct 03 '12 at 16:44
  • 1
    There has to be a good duplicate of this somewhere out there. – Christian Rau Oct 03 '12 at 17:06
  • Yes - that was definately it. When I added a custom allocator to the class that aligned the allocated memory addresses to a 16byte boundary, everything started working fine. – Piotr Trochim Oct 03 '12 at 17:21
  • Try: __declspec(align(16)) SimdTest * test = ... – stark Oct 03 '12 at 17:32
  • 1
    @stark No, that would rather align the pointer variable itself, which is not neccessary, it's the memory allocated and whose address is assigned to the pointer which needs alignment. – Christian Rau Oct 03 '12 at 17:53

2 Answers2

5

It dies because the structure is mis-aligned. The CRT allocator only promises alignment to 8, 16 is required here. You'll need to use _aligned_malloc() on MSVC to get properly aligned heap allocated memory.

Two ways to go about it. Since this is a POD struct, you could just cast:

#include <malloc.h>
...
    SimdTest* test = (SimdTest*)_aligned_malloc(sizeof SimdTest, 16);
    test->setZero();
    _aligned_free(test);

Or you could override the new/delete operators for the struct:

struct SimdTest
{
    void* operator new(size_t size) { return _aligned_malloc(size, 16); }
    void operator delete(void* mem) { return _aligned_free(mem); }
    // etc..
};
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • As a side note, there is usually a bit more to the implementation of those overloads and there should also be overloads for the other `new/delete` versions (and likewise specializations for `std::allocator`). I know you know this, just to remark it for any other readers, so +1 anyway. – Christian Rau Oct 03 '12 at 20:00
-1

MSDN states that the _m128 are automaticly aligned by 16 bytes, not __m128, but _m128. But anyway i guess the others right, as i recall there are two kind of move instructions, one for aligned movAps and one for unaligned - movUps. First requires 16b aligment and other don't. Don't know if compiler are capable of using both, but i'd tryed this _m128 type.

Actually there are special type for that: _M128A.

Ivan0x32
  • 413
  • 1
  • 7
  • 18
  • I don't think that there exist types `_m128` or `_M128A`. And `__m128` *is* automatically aligned to 16 bytes, but that doesn't always work (e.g. when doing dynamic allocation). – interjay Oct 03 '12 at 17:54
  • 2
    The `__m128` types (yes, it's `__m128` not `_m128`, which I've never heard about) are properly aligned if allocated on the stack (e.g. as automatic variables). But this doesn't help when allocating them dynamically (e.g. using `new`). – Christian Rau Oct 03 '12 at 17:55