3

I've been trying to re-implement some existing vector and matrix classes to use SSE3 commands, and I seem to be running into these "memory access violation" errors whenever I perform a series of operations on an array of vectors. I'm relatively new to SSE, so I've been starting off simple. Here's the entirety of my vector class:

class SSEVector3D
{
public:

   SSEVector3D();
   SSEVector3D(float x, float y, float z);

   SSEVector3D& operator+=(const SSEVector3D& rhs); //< Elementwise Addition

   float x() const;
   float y() const;
   float z() const;

private:

   float m_coords[3] __attribute__ ((aligned (16))); //< The x, y and z coordinates

};

So, not a whole lot going on yet, just some constructors, accessors, and one operation. Using my (admittedly limited) knowledge of SSE, I implemented the addition operation as follows:

SSEVector3D& SSEVector3D::operator+=(const SSEVector3D& rhs) 
{
   __m128 * pLhs = (__m128 *) m_coords;
   __m128 * pRhs = (__m128 *) rhs.m_coords;

   *pLhs = _mm_add_ps(*pLhs, *pRhs);

   return (*this);
}

To speed-test my new vector class against the old one (to see if it's worth re-implementing the whole thing), I created a simple program that generates a random array of SSEVector3D objects and adds them together. Nothing too complicated:

SSEVector3D sseSum(0, 0, 0);

for(i=0; i<sseVectors.size(); i++)
{
   sseSum += sseVectors[i];
}

printf("Total: %f %f %f\n", sseSum.x(), sseSum.y(), sseSum.z());

The sseVectors variable is an std::vector containing elements of type SSEVector3D, whose components are all initialized to random numbers between -1 and 1.

Here's the issue I'm having. If the size of sseVectors is 8,191 or less (a number I arrived at through a lot of trial and error), this runs fine. If the size is 8,192 or more, I get this error when I try to run it:

signal: SIGSEGV, si_code: 0 (memory access violation at address: 0x00000080)

However, if I comment out that print statement at the end, I get no error even if sseVectors has a size of 8,192 or more.

Is there something wrong with the way I've written this vector class? I'm running Ubuntu 12.04.1 with GCC version 4.6

Mysticial
  • 464,885
  • 45
  • 335
  • 332
Eric Foote
  • 76
  • 5
  • 5
    See: [How is a vector's data aligned?](http://stackoverflow.com/questions/8456236/how-is-a-vectors-data-aligned) (Also, +1 for a nicely written first question.) – Mysticial Sep 12 '12 at 18:57
  • 1
    You're getting segfaults because STL containers don't align for SSE. The weirdness happening with `8192` is just an artifact in the memory allocator that affects the alignment of the returned pointer. – Mysticial Sep 12 '12 at 18:59
  • 1
    I think an important question to consider is how much data is being loaded by the `_mm_add_ps()` routine, and maybe more critical, how much is written back. How does that mesh with the actual size of your array of floats? I think the answer to that will point out at least three issues - wrong computed results, alignment issues, and array overrun... – twalberg Sep 12 '12 at 19:05
  • Thanks, I didn't realize that. I guess this brings up another couple of questions: – Eric Foote Sep 12 '12 at 19:37
  • 1) Does this mean that every data structure and class that uses SSEVector3D objects also needs to be aligned properly? As well as every std::vector of Vector3D's? [EDIT: okay, I guess it's not a "couple of" questions, it's just the one!] – Eric Foote Sep 12 '12 at 20:00
  • Also, thanks for pointing out the rather obvious issue where I was reading from and writing to memory that was outside the boundaries of my float array! I can't believe I missed that! – Eric Foote Sep 12 '12 at 20:04
  • I'm surprised that this worked at all. `std::vector` will most likely not align the data correctly. Btw, what is the `sizeof(SSEVector3D)`? – Walter Sep 12 '12 at 20:34
  • Accessing address 0x80 looks like the original pointer is wrong. Maybe you are using an offset from NULL? – David Rodríguez - dribeas Sep 12 '12 at 20:48
  • The error disappears if the `printf` is commented out because the whole thing is optimized away – Gunther Piez Sep 12 '12 at 22:06
  • Well, it looks like the sizeof(SSEVector3D) is 16, now that I've fixed m_coords to be of length 4 instead of length 3. I'm still getting the same error though. Is the only solution to write my own allocator and use it wherever I have an STL container of SSEVector 3D objects? Since I'm not using Windows/Visual Studio, it doesn't look like I have the option of using aligned_storage or __declspec(align(16)). – Eric Foote Sep 18 '12 at 21:19
  • Unless you're doing this as a learning exercise, don't. Use an established library instead; I recommend Eigen. Wrt. alignment, the eigen manual as a few sections devoted to that at http://eigen.tuxfamily.org/dox/ – janneb Oct 01 '12 at 11:55

2 Answers2

1

First, and foremost, don't do this

__m128 * pLhs = (__m128 *) m_coords;
__m128 * pRhs = (__m128 *) rhs.m_coords;
*pLhs = _mm_add_ps(*pLhs, *pRhs);

With SSE, always do your loads and stores explicitly via the appropriate intrinsics, never by just dereferencing. Instead of storing an array of 3 floats in your class, store a value of type _m128. That should make the compiler align instances of your class correctly, without any need for align attributes.

Note, however, that this won't work very well with MSVC. MSVC seems to generally be unable to cope with alignment requirements stronger than 8-byte aligned for by-value arguments :-(. The last time I needed to port SSE code to windows, my solution was to use Intel's C++ compiler for the SSE parts instead of MSVC...

fgp
  • 8,126
  • 1
  • 17
  • 18
0

The trick is to notice that __m128 is 16 byte aligned. Use _malloc_aligned() to assure that your float array is correctly aligned, then you can go ahead and cast your float to an array of __m128. Make sure also that the number of floats you allocate is divisible by four.

cdiggins
  • 17,602
  • 7
  • 105
  • 102