9

Does gcc have memory alignment pragma, akin #pragma vector aligned in Intel compiler? I would like to tell compiler to optimize particular loop using aligned loads/store instructions. to avoid possible confusion, this is not about struct packing.

e.g:

#if defined (__INTEL_COMPILER)
#pragma vector aligned
#endif
        for (int a = 0; a < int(N); ++a) {
            q10 += Ix(a,0,0)*Iy(a,1,1)*Iz(a,0,0);
            q11 += Ix(a,0,0)*Iy(a,0,1)*Iz(a,1,0);
            q12 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,0,1);
            q13 += Ix(a,1,0)*Iy(a,0,0)*Iz(a,0,1);
            q14 += Ix(a,0,0)*Iy(a,1,0)*Iz(a,0,1);
            q15 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,1,1);
        }

Thanks

Anycorn
  • 50,217
  • 42
  • 167
  • 261

3 Answers3

11

You can tell GCC that a pointer points to aligned memory by using a typedef to create an over-aligned type that you can declare pointers to.

This helps gcc but not clang7.0 or ICC19, see the x86-64 non-AVX asm they emit on Godbolt. (Only GCC folds a load into a memory operand for mulps, instead of using a separate movups). You have have to use __builtin_assume_aligned if you want to portably convey an alignment promise to GNU C compilers other than GCC itself.


From http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html

typedef double aligned_double __attribute__((aligned (16)));
// Note: sizeof(aligned_double) is 8, not 16
void some_function(aligned_double *x, aligned_double *y, int n)
{
    for (int i = 0; i < n; ++i) {
        // math!
    }
}

This won't make aligned_double 16 bytes wide. This will just make it aligned to a 16-byte boundary, or rather the first one in an array will be. Looking at the disassembly on my computer, as soon as I use the alignment directive, I start to see a LOT of vector ops. I am using a Power architecture computer at the moment so it's altivec code, but I think this does what you want.

(Note: I wasn't using double when I tested this, because there altivec doesn't support double floats.)

You can see some other examples of autovectorization using the type attributes here: http://gcc.gnu.org/projects/tree-ssa/vectorization.html

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • neither. I have array that compiler cannot determine alignment. I have to specifically tell to use aligned load and store. it will not be compiler option, it must be pragma, for each individual loop to be vectorized. – Anycorn Apr 21 '10 at 23:46
  • Why can't you use a variable attribute on the array? – Dietrich Epp Apr 21 '10 at 23:47
  • array is malloced, plus structure of array is pretty complicated. Specifically, it is a four dimensional tensor – Anycorn Apr 21 '10 at 23:48
  • You can put the alignment on the type then, instead of the variable. – Dietrich Epp Apr 21 '10 at 23:52
  • type is double*. If I put alignment on that, all I will get is pointer variable aligned. The array is aligned manually, there is no way around that. Intel pragma specifically tells compiler to use loadpd instructions. I need gcc equivalent – Anycorn Apr 21 '10 at 23:55
  • Yes, put the alignment on the `double` not on the `double*`. Use a typedef to make an aligned_double or equivalent. – Dietrich Epp Apr 21 '10 at 23:56
  • that defeats purpose, performance behind aligned instructions is loading two double variables at a time, not loading one double which is 128 bits wide – Anycorn Apr 21 '10 at 23:59
  • http://www.intel.com/software/products/compilers/docs/clin/main_cls/cref_cls/common/cppref_pragma_vector.htm – Anycorn Apr 22 '10 at 00:01
  • It won't make it 16 bytes wide, just aligned to a 16 byte boundary. – Dietrich Epp Apr 22 '10 at 00:11
  • if you do that, with alignment on type, you must guarantee that each type element starts at 16 byte boundary. If you create an array of such types, compiler must assume that distance between to consecutive elements is 16 bytes. – Anycorn Apr 22 '10 at 00:17
  • Yes, but that's demonstrably not how the `align` attribute works. – Dietrich Epp Apr 22 '10 at 00:18
  • I am not following, can you give me example. I have some raw pointer, which is aligned to 16 bytes, how to inform gcc that it's really 16 bytes aligned – Anycorn Apr 22 '10 at 00:22
  • thank you. That force compiler together with fast-math to report vectorized loops. Unfortunately performance is below Intel compiler. Probably look to play with parameters more – Anycorn Apr 22 '10 at 02:19
6

I tried your solution with g++ version 4.5.2 (both Ubuntu and Windows) and it did not vectorize the loop.

If the alignment attribute is removed then it vectorizes the loop, using unaligned loads.

If the function is inlined so that the array can be accessed directly with the pointer eliminated, then it is vectorized with aligned loads.

In both cases, the alignment attribute prevents vectorization. This is ironic: The "aligned_double *x" was supposed to enable vectorization but it does the opposite.

Which compiler was it that reported vectorized loops for you? I suspect it was not a gcc compiler?

A Fog
  • 4,360
  • 1
  • 30
  • 32
4

Does gcc have memory alignment pragma, akin #pragma vector aligned

It looks like newer versions of GCC have __builtin_assume_aligned:

Built-in Function: void * __builtin_assume_aligned (const void *exp, size_t align, ...)

This function returns its first argument, and allows the compiler to assume that the returned pointer is at least align bytes aligned. This built-in can have either two or three arguments, if it has three, the third argument should have integer type, and if it is nonzero means misalignment offset. For example:

void *x = __builtin_assume_aligned (arg, 16);

means that the compiler can assume x, set to arg, is at least 16-byte aligned, while:

void *x = __builtin_assume_aligned (arg, 32, 8);

means that the compiler can assume for x, set to arg, that (char *) x - 8 is 32-byte aligned.

Based on some other questions and answers on Stack Overflow circa 2010, it appears the built-in was not available in GCC 3 and early GCC 4. But I do not know where the cut-off point is.

jww
  • 97,681
  • 90
  • 411
  • 885