7

I'm a simd beginner, I've read this article about the topic (since I'm using a AVX2-compatible machine).

Now, I've read in this question to check if your pointer is aligned.

I'm testing it with this toy example main.cpp:

#include <iostream>
#include <immintrin.h>

#define is_aligned(POINTER, BYTE_COUNT) \
    (((uintptr_t)(const void *)(POINTER)) % (BYTE_COUNT) == 0)


int main()
{
  float a[8];
  for(int i=0; i<8; i++){
    a[i]=i;
  }
  __m256 evens = _mm256_set_ps(2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0);
  std::cout<<is_aligned(a, 16)<<" "<<is_aligned(&evens, 16)<<std::endl;   
  std::cout<<is_aligned(a, 32)<<" "<<is_aligned(&evens, 32)<<std::endl;   

}

And compile it with icpc -std=c++11 -o main main.cpp.

The resulting printing is:

1 1
1 1

However, if I add thhese 3 lines before the 4 prints:

for(int i=0; i<8; i++)
  std::cout<<a[i]<<" ";
std::cout<<std::endl;

This is the result:

0 1 2 3 4 5 6 7 
1 1
0 1

In particular, I don't understand that last 0. Why is it different from the last printing? What am I missing?

Community
  • 1
  • 1

1 Answers1

6

Your is_aligned (which is a macro, not a function) determines whether the object has been aligned to particular boundary. It does not determine the alignment requirement of the type of the object.

The compiler will guarantee for a float array, that it be aligned to at least the alignment requirement of a float - which is typically 4. 32 is not a factor of 4, so there is no guarantee that the array be aligned to 32 byte boundary. However, there are many memory addresses that are divisible by both 4 and 32, so it is possible that a memory address at a 4 byte boundary happens to also be at a 32 byte boundary. This is what happened in your first test, but as explained, there is no guarantee that it would happen. In your latter test you added some local variables, and the array ended up in another memory location. It so happened that the other memory location wasn't at the 32 byte boundary.

To request a stricter alignment that may be required by SIMD instructions, you can use the alignas specifier:

alignas(32) float a[8];
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Thanks for your answer. So, just to be sure that I correctly understood: let's imagine that the we can represent the memory as contiguous blocks, each of them of 4 bytes (the space taken by a `float` variable`). The compiler guarantees that the array is aligned to these 4 bytes blocks, so the array starts at the beginning of a 4 byte block. But it's not guaranteed that the array starts at the begin of this chunk of a 32 byte chunk (8 of these 4 bytes blocks), but it could happen by chance. Is that correct? – cplusplusuberalles Apr 26 '17 at 10:59
  • One additional question: let's suppose that I have a function which has a `float*` as input argument. We don't know if it's aligned or not. How do I make it aligned? PS: Let me know if it's more appropriate that I open a new question.. – cplusplusuberalles Apr 26 '17 at 11:01
  • @cplusplusuberalles 1. Correct 2. You can of course make the pointer point to another memory address that is aligned, but what would you do with such pointer? Consider analogous question: "I have a function that has a int input argument. We don't know if it's divisible by 2. How do I make it divisible by 2?". You could add 1 if its odd, but just like in the pointer case: How is that useful? – eerorika Apr 26 '17 at 11:11
  • Thanks for your comment. I'm going to open another question here, I'm afraid I'm stucked in a XY problem – cplusplusuberalles Apr 26 '17 at 12:19
  • I've posted a a question called "Do I have to align data to vectorize this function?" (I cannot link it since I'm from my mobile) – cplusplusuberalles Apr 26 '17 at 13:38
  • You can usefully "forcefully make a pointer aligned" though, of course that might move it but that just means you have to deal with a bunch of unaligned data before the aligned part begins – harold Apr 26 '17 at 14:20
  • @harold that should not be the problem: the data started to be processed inside the function, so I can allocate them aligned directly. So the best approach would be to allocate the data inside the function in an aligned way (using `alignas` as suggested in the answer), process them and then assign the pointer to the aligned data (as suggested in the comments). Is that correct? – cplusplusuberalles Apr 30 '17 at 08:10