3

When I run this code in visual studio 2015, the code works correctly.But the code generates the following error in codeblocks : Segmentation fault(core dumped). I also ran the code in ubuntu with same error.

#include <iostream>
#include <immintrin.h>

struct INFO
{
    unsigned int id = 0;
    __m256i temp[8];
};

int main()
{
    std::cout<<"Start AVX..."<<std::endl;
    int _size = 100;
    INFO  *info = new INFO[_size];
    for (int i = 0; i<_size; i++)
    {
        for (int k = 0; k < 8; k++)
        {
            info[i].temp[k] = _mm256_setr_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
                20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31);

        }
    }
    std::cout<<"End AVX."<<std::endl;
    return 0;
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    Can you show the output of: `std::cout << sizeof(INFO) << "\n" << info << "\n" << (((char*) info[0].temp) - ((char*) info)) << "\n";` (for the succeeding and failing version)? – chtz Apr 08 '19 at 07:38
  • CodeBlocks: 288, 0x10261b0, 32 VS2015: 288, 0000014B0EDB4080, 32 – Mohsen Ghahremani Manesh Apr 08 '19 at 08:02
  • 1
    The address under CodeBlocks is not 32byte aligned. I don't think C++11 guarantees that `operator new` respects alignment. You can overload that operator or compile with a newer standard (not sure which one is required to ensure alignment). – chtz Apr 08 '19 at 08:41
  • 1
    Out of curiosity (as I am completely inexperienced with memory alignment), is there a reason to use an array allocated with `new` instead of a `vector`? – Michiel uit het Broek Apr 08 '19 at 09:59
  • 2
    @Michiel I agree, in 99.9% of all cases you should prefer `std::vector` over manual `new`/`delete` (i.e., unless you know why the later is better in your case). Still, prior to C++17 you'd have similar alignment problems here. – chtz Apr 08 '19 at 12:30
  • Btw, I'm not sure if with your example MSVC works correctly by design, or by chance (e.g., it might just conservatively align everything to 32bytes instead of 16). – chtz Apr 08 '19 at 12:32
  • 1
    @chtz: it probably works with MSVC because MSVC never uses alignment-required loads/stores. It always uses `[v]movdqu` even if there's a compile-time alignment guarantee. Only for non-AVX when folding a load into a memory source operand (like `paddb xmm0, [rdi]`) will you get a requirement of even 16-byte loads. This is a missed-optimization for very old CPUs (older than Nehalem, e.g. Core 2), where `movdqu` is more expensive than `movdqa` even if the address is aligned at run-time. But gcc/clang do make asm that requires as much alignment as they can. – Peter Cordes Apr 08 '19 at 18:44

2 Answers2

5

The problem is that prior to C++17 new and delete did not respect the alignment of the to-be-allocated type. If you look at the generated assembly from this simple function:

INFO* new_test() {
    int _size = 100;
    INFO  *info = new INFO[_size];
    return info;
}

You'll see that when compiled with anything prior to C++17 operator new[](unsigned long) is called, whereas for C++17 a call is made to operator new[](unsigned long, std::align_val_t) (and 32 is passed for the second parameter). Play around with it at godbolt.

If you can't use C++17, you can overwrite operator new[] (and operator delete[] -- and you should overwrite operator new and operator delete as well ...):

struct INFO {
    unsigned int id = 0;
    __m256i temp[8];
    void* operator new[](size_t size) {
        // part of C11:
        return aligned_alloc(alignof(INFO), size);
    }
    void operator delete[](void* addr) {
        free(addr); // aligned_alloc is compatible with free
    }
};

This is part of the previous godbolt example, if you compile with -DOVERWRITE_OPERATOR_NEW.

Note that this does not solve the alignment issue when using std::vector (or any other std-container), for that you need to pass an aligned allocator to the container (not part of the previous example).

chtz
  • 17,329
  • 4
  • 26
  • 56
  • In Windows aligned_alloc and free , has been replaced with _aligned_malloc and _aligned_free which can be found in malloc.h. for further information take a look at https://stackoverflow.com/questions/32133203/what-can-i-use-instead-of-stdaligned-alloc-in-ms-visual-studio-2013 – Mohsen Ghahremani Manesh Apr 09 '19 at 10:22
0

I found two ways to solve this problem

The first solution How to solve the 32-byte-alignment issue for AVX load/store operations?

struct INFO
{
    __m256i temp[8];
    unsigned int id = 0;
};
INFO  *info = static_cast<INFO*>(_mm_malloc(sizeof(INFO)*_size, 32));
_mm_free(info);

The second solution

INFO  *info = new INFO[_size];
for (int i = 0; i < _size; i++)
{
    INFO new_info;
    for (int k = 0; k < 8; k++)
    {
        new_info.temp[k] = _mm256_setr_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
            20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31);

    }
    info[i] = new_info;
}
  • 1
    `_mm_malloc` is *not* a drop-in replacement for `new`. It's not compatible with `delete` (or `free`), only with `_mm_free`. – Peter Cordes Apr 09 '19 at 03:07
  • 1
    In your second version, that maybe happened to work but it's not safe! If `new INFO[_size]` produces an under-aligned pointer, the struct-assignment could fault from using 32-byte-alignment-required `vmovdqa` to copy the `__m256i` members. (If it's big enough that the compiler decides to just call memcpy for it, then that will avoid faults, but later use of `info[i].temp[k]` could still fault if you have an under-aligned `__m256i`. Unless you only ever use it with `_mm256_loadu_si256` / storeu) – Peter Cordes Apr 09 '19 at 03:10