Alignment, total size and SSE

Question

I'm trying to define a custom point type for the PCL library. In that tutorial, they're talking about memory alignment, so I started off by trying to understand how it works.

In this page, they present a rather simple way of calculating the total alignment of a structure. For example, this structure

// Alignment requirements
// (typical 32 bit machine)

// char         1 byte
// short int    2 bytes
// int          4 bytes
// double       8 bytes

// structure C
typedef struct structc_tag
{
  char        c;
  double      d;
  int         s;
} structc_t;

will have a size of 24:

1 byte for the char + 7 bytes of padding + 8 bytes for the double + 4 bytes for the int + 4 bytes of padding

and for g++ 4.8.1, sizeof returns 24. So far, so good.

Now, in PCL they're defining the point types with this scheme (here's the most simple point, that holds the position in each axis) for SSE alignment.

union
{
  float data[4];
  struct
  {
    float x;
    float y;
    float z;
  };
};

sizeof returns 16. With the union it is made sure that the point type is SSE aligned (I read here that is 16 byte alignment) and with the struct the axis values are accessible.

Quoting from the PCL docs:

The user can either access points[i].data[0] or points[i].x for accessing say, the x coordinate.

Is my reasoning valid until here?

In my case, I want to change the floats for doubles in order to have more precision in the X and Y axis.

So, is it enough to declare the point type as:

union {
  float data[4];
  struct {
    double x;
    double y;
    float z;
  };
};

? sizeof returns 24, which is not a multiple of 16 (so I understand it's not SSE aligned) but it is "double aligned".

My question is, how can I define my point type to be able to store the X and Y coordinates as double and still be SSE aligned?

PS: Also, if any of you know of a good resource for this, please tell me. I want to understand better this topic.

PS 2: I forgot to tell, the platform I'm trying all of this is a 64 bit one.

PS 3: If possible, I'm interested in pre-C++11 solutions. A compiler as old as g++ 4.4 (and its MinGW counterpart) must be able to build the new point type.

Have you considered the [`alignas`](http://en.cppreference.com/w/cpp/language/alignas) specifier? — Brett Hale, Dec 17 '13 at 11:57
@BrettHale, I dindn't and I forgot (again) to mention that I can't use `C++11` solutions. I need to be able to compile the point type with old compilers (the oldest one being g++ 4.4). I'll update the info in the post right away. — Adri C.S., Dec 17 '13 at 12:00
@elSnape Do you mean [this](http://gcc.gnu.org/onlinedocs/gcc/Structure-Packing-Pragmas.html)? If so, yes (at least `MinGW` didn't protest when I put `#pragma pack(32)`. I think `g++` won't cry either) — Adri C.S., Dec 17 '13 at 12:06
Yes, I think that would work! Make sure to #pragma pop after you #pragma push, otherwise nasty things can happen. — elSnape, Dec 17 '13 at 12:22
@elSnape, it seems that in the end I won't be using the `pragma`. In the `PCL` mailing lists advised to use only `EIGEN_ALIGN16` to avoid conflicts. — Adri C.S., Dec 17 '13 at 16:42

Z boson · Accepted Answer · 2013-12-17T14:16:36.433

1

The size of the object and it's aligment are not the same thing. If the size of the struct is 16 bytes or some multiple it does not mean it will necessarily be 16 byte aligned.

In your case since your code is compiled in 64-bit mode you just need to pad the struct to 32 bytes. In 64-bit mode the stack is 16 byte aligned in Windows and Linux/Unix.

In 32-bit mode it does not have to be 16 byte aligned. You can test this. If you run the code below in MSVC in 32-bit mode you will likely see that the address for each element of the array is not 16 byte aligned (you might have to run it a few times). So even though the size of the struct is a multiple of 16 bytes it is not necessarily 16 byte aligned.

#include <stdio.h>

int main() { 
    union a {
        float data[4];
        struct {
            double x;
            double y;
            float z;
            float pad[3];
    };
    a b[10];
    for(int i=0; i<10; i++) {
        printf("%d\n", ((int)&b[i])%16);
    }
}

If you want your code to work in 32-bit mode as well then you should align the memory. If you run the code below in 32-bit mode on Windows or Linux you will see that it's always 16 byte aligned as well.

#include <stdio.h>
#ifdef _MSC_VER // If Microsoft compiler
#define Alignd(X) __declspec(align(16)) X
#else // Gnu compiler, etc.
#define Alignd(X) X __attribute__((aligned(16)))
#endif

int main() {
    union a {
        float data[4];
        struct {
            double x;
            double y;
            float z;
            float pad[3];
    };
    a Alignd(b[10]);
    for(int i=0; i<10; i++) {
        printf("%d\n", ((int)&b[i])%16);
    }
}

edited Dec 17 '13 at 14:16

answered Dec 17 '13 at 13:10

Z boson

32,619
11
123
226

Hi! This doesn't use the `union` as PCL do. Having the `struct` inside the `union` work as well with this solution? I want to keep the structure as close as possible to the original. – Adri C.S. Dec 17 '13 at 13:40
@AdriC.S., it does not matter but in any case I updated my answer with the union. – Z boson Dec 17 '13 at 13:55
Ok. In the end I was told in the `PCL` mailing list to use `EIGEN_ALIGN16` to specify the alignment to avoid conflicts. However, I'm still choosing this option as is the better explained and the most similar to the `PCL` point type. Now, in the mailing list someone wrote: `there is an incoherency in the union : “float data[4]” does not cover all your XYZ double data – double data[4] may be better considering 16 bytes alignment`. What do you think? – Adri C.S. Dec 18 '13 at 09:40
1

EIGEN_ALIGN16 is probably using the __attribute__((aligned(16))) anyway. In terms of incoherency it really depends on what you are trying to achieve. If you want to load the struct into an SSE register or AVX register then I would use all float for SSE and all double with AVX. I would not mix float and double like you have done in the union. But with AVX alignment is no so important anyway. – Z boson Dec 18 '13 at 10:04
`I would not mix float and double like you have done in the union` Mm, I'll use all doubles. The idea is to allow SSE alignment, but I can't use floats. I need the double precision for geospatial reference. – Adri C.S. Dec 18 '13 at 10:51

eranb · Answer 2 · 2013-12-17T13:14:09.833

0

In order to have a struct which has 2 doubles and a float, and be SSE aligned (16 bytes), use :

#pragma pack(1)
struct T
{
 double x,y;   // 16 bytes
 float z;      // 4 bytes
 char gap[12]; // 12 bytes
};

sizeof(T) will be 32, so if the first point is 16-bytes aligned, the whole vector will be aligned.

In order to make the first point aligned you should use __attribute((aligned(16)) for stack variables, or aligned_alloc for heap memory.

But, most of the algorithms of PCL are written and hard-coded for floats and not doubles, so they won't work...

Refer : pcl-users link

edited Dec 17 '13 at 13:14

answered Dec 17 '13 at 12:38

eranb

89
8

This does not guarantee that the struct is 16-byte aligned. I tested it with MSVC in 32-bit mode. You need to use __declspec(align(16)) or __attribute__((aligned(16) to make sure it's 16-byte aligned in 32-bit mode. In 64-bit mode there is nothing to worry about. – Z boson Dec 17 '13 at 13:22
Because of the `gap` array, we're not in [this situation](http://stackoverflow.com/questions/3318410/pragma-pack-effect), right? – Adri C.S. Dec 17 '13 at 14:31
1

I think it just needs to use `#pragma pack(16)` or `#pragma pack(32)` – Z boson Dec 17 '13 at 17:01

Alignment, total size and SSE

2 Answers2