6

I have manged to convert most of my SIMD code to us the vector extensions of GCC. However, I have not found a good solution for doing a broadcast as follows

__m256 areg0 = _mm256_broadcast_ss(&a[i]);

I want to do

__m256 argeg0 = a[i];

If you see my answer at Mutiplying vector by constant using SSE I managed to get broadcasts working with another SIMD register. The following works:

__m256 x,y;
y = x + 3.14159f; // broadcast x + 3.14159
y = 3.14159f*x;  // broadcast 3.14159*x

but this won't work:

 __m256 x;
 x = 3.14159f;  //should broadcast 3.14159 but does not work

How can I do this with GCC?

Community
  • 1
  • 1
Z boson
  • 32,619
  • 11
  • 123
  • 226
  • This appears to work fine in Clang using the OpenCL vector extentions `typedef float float4 __attribute__((ext_vector_type(8)));`. However, Clang does not allow the broadcasts with a register using the GCC vector extensions so I'm not sure it is entirely compatible with GCC. – Z boson Feb 12 '14 at 12:53
  • 2
    `__m256 zero={}; __m256 x=zero+3.14159f;` – Marc Glisse Feb 12 '14 at 12:56
  • 1
    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55726 – Marc Glisse Feb 12 '14 at 12:59
  • 1
    @MarcGlisse, I tested your solution in GCC Explorer and I can confirm that it gets converted to vbroadcastss. If you want to write up an answer I'll accept it. I understand the ambiguity of double vs. float. I guess it's even worse for integers since they can be 8-bits, 16-bits, 32-bits, or 64-bits. – Z boson Feb 12 '14 at 14:25

2 Answers2

5

I think there is currently no direct way and you have to work around it using the syntax you already noticed:

__m256 zero={};
__m256 x=zero+3.14159f;

It may change in the future if we can agree on a good syntax, see PR 55726.

Note that if you want to create a vector { s, s, ... s } with a non-constant float s, the technique above only works with integers, or with floats and -fno-signed-zeros. You can tweak it to __m256 x=s-zero; and it will work unless you use -frounding-math. A last version, suggested by Z boson, is __m256 x=(zero+1.f)*s; which should work in most cases (except possibly with a compiler paranoid about sNaN).

Marc Glisse
  • 7,550
  • 2
  • 30
  • 53
  • This needs to be fixed. GCC cannot do this efficiently [due to signed zero](http://stackoverflow.com/a/43801280/2542702). How hard is it to get involved with GCC development to fix something like this? I would like to help but I don't know if it's practical for me to try. – Z boson May 08 '17 at 12:24
  • Actually, I just found an efficient solution. `v4sf one = {1,1,1,1}; x= one*3.14159f;` This is because `x*1` can be simplified to just `x` for floating point. See the end of [my answer](http://stackoverflow.com/a/43801280/2542702). – Z boson May 08 '17 at 13:39
  • So I could just have done `x = 3.14159f - (__m256){}`. Apparently signed zero does not matter for `x-0`. – Z boson May 10 '17 at 09:24
  • @Zboson as I wrote above, the x-0 version doesn't optimize with -frounding-math, so there are still cases where x*1 is preferable. – Marc Glisse May 10 '17 at 11:24
3

It turns out that with a precise floating point model (e.g. with -O3) that GCC cannot simplify x+0 to x due to signed zero. So x = zero+3.14159f produces inefficient code. However GCC can simplify 1.0*x to just x therefore the efficient solution in this case is.

__m256 x = ((__m256){} + 1)*3.14159f;

https://godbolt.org/g/5QAQkC

See this answer for more details.


A simpler solution is just x = 3.14159f - (__m256){} because x - 0 = x irrespective of signed zero.

Community
  • 1
  • 1
Z boson
  • 32,619
  • 11
  • 123
  • 226