5

I have written and debugged some AVX code with g++ and now I'm trying to get it to work with MSVC, but I keep getting

error LNK2019: unresolved external symbol __mm256_setr_epi64x referenced in function "private: union __m256i __thiscall avx_matrix::avx_bit_mask(unsigned int)const " (?avx_bit_mask@avx_matrix@@ABE?AT__m256i@@I@Z)

The referenced piece of code is

...

#include <immintrin.h>

...

    /* All zeros except for pos-th position (0..255) */
    __m256i avx_matrix::avx_bit_mask(const std::size_t pos) const
    {
        int64_t a = (pos >= 0 && pos < 64) ? 1LL << (pos - 0) : 0;
        int64_t b = (pos >= 64 && pos < 128) ? 1LL << (pos - 64) : 0;
        int64_t c = (pos >= 128 && pos < 192) ? 1LL << (pos - 128) : 0;
        int64_t d = (pos >= 192 && pos < 256) ? 1LL << (pos - 256) : 0;
        return _mm256_setr_epi64x(a, b, c, d);
    }
...
  • I have enabled /arch:AVX, but it doesn't make any difference.
  • My machine definitely supports AVX - it is the same one I used for the original Linux project.
  • Also, http://msdn.microsoft.com/en-us/library/hh977022.aspx lists _mm256_setr_epi64x among the available intrinsics.

Any help would be much appreciated.

jww
  • 97,681
  • 90
  • 411
  • 885
PJK
  • 2,082
  • 3
  • 17
  • 28

2 Answers2

5

It looks this might actually be a known bug - certain AVX intrinsics are apparently not available in 32-bit mode. Try building for 64 bit and/or upgrading to Visual Studio 2013 Update 2, where this has supposedly now been fixed.

Alternatively, if you just have the one instance above where you are using this intrinsic, then you could change your function to:

__m256i avx_matrix::avx_bit_mask(const std::size_t pos) const
{
    int64_t a[4] = { (pos >=   0 && pos <  64) ? 1LL << (pos -   0) : 0,
                     (pos >=  64 && pos < 128) ? 1LL << (pos -  64) : 0,
                     (pos >= 128 && pos < 192) ? 1LL << (pos - 128) : 0,
                     (pos >= 192 && pos < 256) ? 1LL << (pos - 256) : 0 };
    return _mm256_loadu_si256((__m256i *)a);
}

or perhaps even:

__m256i avx_matrix::avx_bit_mask(const std::size_t pos) const
{
    int64_t a[4] = { 0 };
    a[pos >> 6] = 1LL << (pos & 63ULL);
    return _mm256_loadu_si256((__m256i *)a);
}

which might be a little more efficient.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • Your link says "This Connection is Untrusted". – Z boson Dec 03 '14 at 08:53
  • 2
    Heh - looks like Microsoft forgot to renew their certificate. – Paul R Dec 03 '14 at 09:03
  • 1
    Well +1 for finding that this has supposedly been fixed. However, it's 2014. Who is really using 32-bit mode anymore? OS X is only 64-bit now. Ubuntu is phasing out 32-bit in one year. MSFT should have fixed this 8 years ago. – Z boson Dec 03 '14 at 09:06
  • 1
    People are often stuck with 32 bit on Windows for various reasons: third party 32-bit only libraries, use of inline assembler, legacy code that is not 64-bit clean, etc. But this is just one of many reasons why I avoid the whole toxic Microsoft ecosystem as far as possible - I'll stick with OS X and Linux, thanks. – Paul R Dec 03 '14 at 09:12
  • @Zboson: 32-bit code was [alive and kicking in **Linux** as of 2013](http://lwn.net/Articles/548838/)... – user541686 Dec 03 '14 at 09:14
  • 1
    I gave up on MSVC several months ago. It's not so great for optimization and likes to do too many things differently e.g. it only defines `__AVX__` and `__AVX2__`. It only supports OpenMP from 2003. When I converted my fractal code to use FMA with MSVC it was much slower than without FMA. With GCC it was faster. – Z boson Dec 03 '14 at 09:17
  • 2
    You solution using `_mm256_loadu_si256` is clearly better than mine. I did not think about this carefully. Of course I would never use my solution in main loop. I concentrated on the intrinsic. This question would have been better if it asked how to set a single bit given an index efficiently. – Z boson Dec 03 '14 at 10:20
  • Well I wasn't sure whether the OP needed a general solution for multiple instances of `_mm256_setr_epi64x` or whether this was just a one-off, but yes, hopefully none of this stuff is going on inside performance-critical loops, so any solution that works should be fine. – Paul R Dec 03 '14 at 10:26
5

In 32-bit mode MSVC does not support

  • _mm_set_epi64x
  • _mm_setr_epi64x
  • _mm_set1_epi64x
  • _mm256_set_epi64x
  • _mm256_setr_epi64x
  • _mm256_set1_epi64x

In your case in 32-bit mode you can do this:

    union {
        int64_t q[4];
        int32_t r[8];
    } u;
    u.q[0] = a; u.q[1] = b; u.q[2] = c; u.q[3] = d;
    return _mm256_setr_epi32(u.r[0], u.r[1], u.r[2], u.r[3], u.r[4], u.r[5], u.r[6], u.r[7]);

Since Visual Studio 2015 (_MSC_VER 1900) these intrinsics are supported in 32-bit mode.

Z boson
  • 32,619
  • 11
  • 123
  • 226
  • I believe the way you are using the union is undefined behavior in C++. – jww Feb 12 '19 at 02:25
  • @jww, can you provide a link (e.g. from SO) explaining why you think it's UB? – Z boson Feb 12 '19 at 08:01
  • 1
    [Accessing inactive union member and undefined behavior?](https://stackoverflow.com/q/11373203/608639) – jww Feb 12 '19 at 14:09
  • @jww, I read through some of the answers in that link and I'm confused. Feel free to augment my answer. (please don't delete anything, only add new information after what I have now) if you like with a solution that you think is not UB. – Z boson Feb 13 '19 at 08:10
  • @jww, I guess using `memcpy` is a safer (but even more inefficient) solution? – Z boson Feb 13 '19 at 08:21
  • @jww: Microsoft defines the behaviour of code like this in MSVC; they even define `__m256i` in terms of a similar union. There's also a GNU extension (gcc/clang/icc) that defines the behaviour of union type punning in GNU C++ and C89. (It's already well-defined in ISO C99/C11) – Peter Cordes Feb 13 '19 at 08:42
  • I'm pretty sure all the major compilers explicitly support C-style union type-punning for C++ even if it's technically undefined in C++. – Mysticial Feb 14 '19 at 19:51