26

In my program I have a function that does a simple vector addition c[0:15] = a[0:15] + b[0:15]. The function prototype is:

void vecadd(float * restrict a, float * restrict b, float * restrict c);

On our 32-bit embedded architecture there is a load/store option of loading/storing double words, like:

r16 = 0x4000  ;
strd r0,[r16] ; stores r0 in [0x4000] and r1 in [0x4004]

The GCC optimizer recognizes the vector nature of the loop and generates two branches of the code - one for the case where the 3 arrays are double word aligned (so it uses the double load/store instructions) and the other for the case that the arrays are word-aligned (where it uses the single load/store option).

The problem is that the address alignment check is costly relative to the addition part and I want to eliminate it by hinting the compiler that a, b and c are always 8-aligned. Is there a modifier to add to the pointer declaration to tell this to the compiler?

The arrays that are used for calling this function have the aligned(8) attribute, but it is not reflected in the function code itself. is it possible to add this attribute to the function parameters?

ysap
  • 7,723
  • 7
  • 59
  • 122
  • Even if my code below can't help you (because of it being C++), you might want to printf("%p") &array[0] and &array[1] in your code just to make sure that the align is being obeyed, and per element - not just on the array start address. – Joe Mar 07 '12 at 20:50
  • 1
    @Joe - it is actually required that it DOES NOT align per array element. It really has to be a contiguous array of floats, whose origin is 8-aligned. – ysap Mar 07 '12 at 21:48

6 Answers6

13

If the attributes don't work, or aren't an option ....

I'm not sure, but try this:

void vecadd (float * restrict a, float * restrict b, float * restrict c)
{
   a = __builtin_assume_aligned (a, 8);
   b = __builtin_assume_aligned (b, 8);
   c = __builtin_assume_aligned (c, 8);

   for ....

That should tell GCC that the pointers are aligned. From that whether it does what you want depends on whether the compiler can use that information effectively; it might not be smart enough: these optimizations aren't easy.

Another option might be to wrap the float inside a union that must be 8-byte aligned:

typedef union {
  float f;
  long long dummy;
} aligned_float;

void vedadd (aligned_float * a, ......

I think that should enforce 8-byte alignment, but again, I don't know if the compiler is smart enough to use it.

ams
  • 24,923
  • 4
  • 54
  • 75
  • Doh! I've just noticed on the next page of the GCC manual `__builtin_assume_aligned`. I'll edit the answer. – ams Mar 08 '12 at 14:02
  • Thanks, @ams. This is probably the perfect solution. Unfortunately, on our compiler, although compiled well it does not affect the output and the compiler still makes the check whether the pointers are aligned or not and chooses the required code path. – ysap Mar 08 '12 at 17:09
  • If someone can confirm it works on other architectures, I'll accept this answer. – ysap Mar 08 '12 at 17:10
  • Your union proposal, BTW, is not what I'm looking for as it will make every element of the array 8-aligned, while I am dealing with array of floats. I can, however, pack two floats in a struct to work in the same manner. – ysap Mar 08 '12 at 17:13
  • ... OR, I can just cast an array of floats to align_float when calling the function. This might work, I'll give it a try. – ysap Mar 08 '12 at 17:15
  • 1
    Casting stuff around is probably a Bad Plan as you can get aliasing bugs. There's no reason you can't have `union {float f[100]; long long dummy}` though :) – ams Mar 08 '12 at 19:40
  • This doesn't appear to work on my Core i7-4765T for [this simple code](https://pastebin.mozilla.org/9019706): compiling with `gcc -std=c99 test.c -S -masm=intel -O3 -march=native`, I get `testGood` to use AVX vectorization, while `testBad` with `__builtin_assume_aligned` just uses x87 instructions. – Ruslan Apr 22 '17 at 18:50
  • Beware that this `aligned_float` union has size=8, and an array of it would have padding. You have to cast the pointer to `float*` before you can use it normally. `typedef __attribute__((aligned(8))) float aligned_float;` works with gcc (e.g. as a function arg), but clang doesn't infer alignment from that. https://godbolt.org/z/tCLkfp (Still auto-vectorizes with `movups`, not `movaps`, on x86 for example.) – Peter Cordes Jan 15 '19 at 03:22
  • No, if you're casting it then you're doing it wrong. You should always use the union dot notation to access it. An array is possible, and if you want every element 8-byte aligned the padding is natural. This type does not work transparently. – ams Jan 16 '19 at 08:02
10

Following a piece of example code I've found on my system, I tried the following solution, which incorporate ideas from a few of the answers given earlier: basically, create a union of a small array of floats with a 64-bit type - in this case a SIMD vector of floats - and call the function with a cast of the operand float arrays:

typedef float f2 __attribute__((vector_size(8)));
typedef union { f2 v; float f[2]; } simdfu;

void vecadd(f2 * restrict a, f2 * restrict b, f2 * restrict c);

float a[16] __attribute__((aligned(8)));
float b[16] __attribute__((aligned(8)));
float c[16] __attribute__((aligned(8)));

int main()
{
    vecadd((f2 *) a, (f2 *) b, (f2 *) c);
    return 0;
}

Now the compiler does not generate the 4-aligned branch.

However, the __builtin_assume_aligned() would be the preferable solution, preventing the cast and possible side effects, if it only worked...

EDIT: I noticed that the builtin function is actually buggy on our implementation (i.e, not only it doesn't work, but it causes calculation errors later in the code.

ysap
  • 7,723
  • 7
  • 59
  • 122
6

How to tell GCC that a pointer argument is always double-word-aligned?

It looks like newer versions of GCC have __builtin_assume_aligned:

Built-in Function: void * __builtin_assume_aligned (const void *exp, size_t align, ...)

This function returns its first argument, and allows the compiler to assume that the returned pointer is at least align bytes aligned. This built-in can have either two or three arguments, if it has three, the third argument should have integer type, and if it is nonzero means misalignment offset. For example:

void *x = __builtin_assume_aligned (arg, 16);

means that the compiler can assume x, set to arg, is at least 16-byte aligned, while:

void *x = __builtin_assume_aligned (arg, 32, 8);

means that the compiler can assume for x, set to arg, that (char *) x - 8 is 32-byte aligned.

Based on some other questions and answers on Stack Overflow circa 2010, it appears the built-in was not available in GCC 3 and early GCC 4. But I do not know where the cut-off point is.

jww
  • 97,681
  • 90
  • 411
  • 885
  • 1
    Thanks. It was mentioned in a couple of the answers here, and was available in GCC at the time of asking the question. – ysap Mar 27 '17 at 10:00
1

Alignment specifications usually only work for alignments that are smaller than the base type of a pointer, not larger.

I think easiest is to declare your whole array with an alignment specification, something like

typedef float myvector[16];
typedef myvector alignedVector __attribute__((aligned (8));

(The syntax might not be correct, I always have difficulties to know where to put these __attribute__s)

And use that type throughout your code. For your function definition I'd try

void vecadd(alignedVector * restrict a, alignedVector * restrict b, alignedVector * restrict c);

This gives you an additional indirection but this is only syntax. Something like *a is just a noop and only reinterprets the pointer as a pointer to the first element.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • Thanks. Why not put the attribute on the 1st typedef? – ysap Mar 07 '12 at 21:40
  • @ysap, I simply don't know where to put the attribute for an array type. The syntax is crude. – Jens Gustedt Mar 07 '12 at 21:47
  • Following Joe's comment I think the answer to my question above is that attributing the 1st typedef will make the array **elements** 8-aligned, and not the array itself (but obviously, it *will* be aligned due to the alignment of the 1st element). Does it make sense? – ysap Mar 07 '12 at 21:51
1

gcc versions have been dodgy about align() on simple typedefs and arrays. Typically to do what you want, you would have to wrap the float in a struct, and have the contained float have the alignment restriction.

With operator overloading you can almost make this painless, but it does assume you can use c++ syntax.

#include <stdio.h>
#include <string.h>

#define restrict __restrict__

typedef float oldfloat8 __attribute__ ((aligned(8)));

struct float8
{
    float f __attribute__ ((aligned(8)));

    float8 &operator=(float _f) { f = _f; return *this; }
    float8 &operator=(double _f) { f = _f; return *this; }
    float8 &operator=(int _f) { f = _f; return *this; }

    operator float() { return f; }
};

int Myfunc(float8 * restrict a, float8 * restrict b, float8 * restrict c);

int MyFunc(float8 * restrict a, float8 * restrict b, float8 * restrict c)
{
    return *c = *a* *b;
}

int main(int argc, char **argv)
{
    float8 a, b, c;

    float8 p[4];

    printf("sizeof(oldfloat8) == %d\n", (int)sizeof(oldfloat8));
    printf("sizeof(float8) == %d\n", (int)sizeof(float8));

    printf("addr p[0] == %p\n", &p[0] );
    printf("addr p[1] == %p\n", &p[1] );

    a = 2.0;
    b = 7.0;
    MyFunc( &a, &b, &c );
    return 0;
}
Joe
  • 2,946
  • 18
  • 17
  • 1
    Thanks, @Joe. 1st, I am restricted to C. 2nd, the possible problem I see here (as in other suggestions) is that it seems like when declaring a vector of float8 elements, each element will be 8-aligned. This will create a non-contiguous array of float-space-float-space, etc. I assume the printf() of p[0] and p[1] will reveal that fact. – ysap Mar 08 '12 at 20:25
-2

I never used it, but there is _attribute_((aligned (8)))

If I read the documentation right, then it is used this way:

void vecadd(float * restrict a __attribute__((aligned (8))), 
            float * restrict b __attribute__((aligned (8))), 
            float * restrict c __attribute__((aligned (8))));

see http://ohse.de/uwe/articles/gcc-attributes.html#type-aligned

Jörg Beyer
  • 3,631
  • 21
  • 35
  • Unless I missed it, then GCC documentation and the page you linked to mentioned align attribute for variables and for functions but **not** for function prototype parameters. Can you point to the relevant section in the page you linked to? – ysap Mar 07 '12 at 20:26
  • 8
    I don't think that this will be working. This tells the compiler that the pointer variable itself is 8 byte aligned. – Jens Gustedt Mar 07 '12 at 20:29
  • 2
    I can confirm this doesn't compile. `error: alignment may not be specified for` – Andrew Wagner Dec 06 '16 at 15:00
  • I think a typedef of the arguments will clear the compile error. – jww Mar 27 '17 at 07:35