2

This is a simpler view of my Problem, I want to convert a float value into defined type v4si (I want to use SIMD Operation for optimization.) Please help to convert float/double value to a defined type.

#include<stdio.h>

typedef double v4si __attribute__ ((vector_size (16)));

int main()
{
    double stoptime=36000;
    float x =0.5*stoptime;
    float * temp = &x;
    v4si a = ((v4si)x);   // Error: Incompatible data types
    v4si b;
    v4si *c;
    c = ((v4si*)&temp);   // Copies address of temp,           
    b = *(c);                   
    printf("%f\n" , b);      //    but printing (*c) crashes program
}
Z boson
  • 32,619
  • 11
  • 123
  • 226
Sarmad
  • 69
  • 6
  • 4
    C or C++? Which one? – Sourav Ghosh May 02 '17 at 10:15
  • 1
    Is there some reason why you can't just use intrinsics for this in the normal way ? Also, what CPU/architecture are we talking about here ? x86 ? ARM ? POWER/PowerPC ? – Paul R May 02 '17 at 10:25
  • Its C programming (mentioned in title). Its x86 architecture, Actually I am very new in SIMD and trying to optimize c code by removing for Loops with SIMD vector multiplications. – Sarmad May 02 '17 at 10:35
  • OK - I've updated your tags and am assuming you want to use SSE. If you use the search facility here on StackOverflow you'll find lots of questions tagged `[sse]` - take a look at some of these to become familiar with some of the basics of using SIMD intrinsics, – Paul R May 02 '17 at 10:38
  • As a beginner, you'll have an easier time using Intel's `_mm_loadu_ps` intrinsic to load 4 doubles, and `_mm_mul_ps` to multiply them. Intel's intrinsics are better documented and have more tutorials than the GNU C vector extensions you're using. The main downside is that they're not portable outside of x86, but you're only targeting x86. See the [SSE](http://stackoverflow.com/tags/sse/info) tag wiki for links to docs and tutorials. – Peter Cordes May 09 '17 at 22:12
  • Also, `v4si` is a really confusing name for a vector of 2 `double`s. `v4si` should be a vector of 4 signed (32-bit) integers. Also, your code is full of bugs. e.g. `float *temp` is a pointer-to-float, but you're setting `c = &temp`. So `c` holds the address of another pointer, *not* a pointer to an array of float. You didn't even declare any arrays. – Peter Cordes May 09 '17 at 22:16

2 Answers2

3

You don't need to define a custom SIMD vector type (v4si) or mess around with casts and type punning - just use the provided intrinsics in the appropriate *intrin.h header, e.g.

#include <xmmintrin.h> // use SSE intrinsics 

int main(void)
{
    __m128 v;          // __m128 is the standard SSE vector type for 4 x float
    float x, y, z, w;

    v = _mm_set_ps(x, y, z, w);
                       // use intrinsic to set vector contents to x, y, z, w

    // ...

    return 0;
}
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • Same header file can be used for intrinsics in AMD64 architecture also? I have to run my code on multiple Systems. – Sarmad May 02 '17 at 12:31
  • Yes, no problem - you need to be aware of which SIMD instruction sets are supported on your target architecture if you are going beyond SSE3, but you should be fine with SSE2/SSE3. – Paul R May 02 '17 at 13:49
  • 1
    @Sarmad : If you are targeting 64-bit code (AMD64/x86-64) then the processor supports SSE and SSE2 (by default) - so yes you can use `xmmintrin.h`. When AMD created the 64-bit specification for their chips the processor supported SSE/SSE2. Intel then adopted the AMD specification so all x86-64 processors from Intel support by default SSE/SSE2. If you need > SSE2 then not all 64-bit processors may support those features by default. – Michael Petch May 02 '17 at 15:07
3

You appear to be using GCC vector extensions. The following code shows how to do broadcasts, vector + scalar, vector*scalar, loads and stores using vector extensions. #include

#if defined(__clang__)
typedef float v4sf __attribute__((ext_vector_type(4)));
#else
typedef float v4sf __attribute__ ((vector_size (16)));
#endif

void print_v4sf(v4sf a) { for(int i=0; i<4; i++) printf("%f ", a[i]); puts(""); }

int main(void) {
  v4sf a;
  //broadcast a scalar
  a = ((v4sf){} + 1)*3.14159f;  
  print_v4sf(a);

  // vector + scalar
  a += 3.14159f;
  print_v4sf(a);

  // vector*scalar
  a *= 3.14159f;
  print_v4sf(a);

  //load from array
  float data[] = {1, 2, 3, 4};
  a = *(v4sf*)data;
  //a = __builtin_ia32_loadups(data);

  //store to array
  float store[4];
  *(v4sf*)store = a;
  for(int i=0; i<4; i++) printf("%f ", store[i]); puts("");
}

Clang 4.0 and ICC 17 support a subset of the GCC vector extensions. However, neither of them support vector + scalar or vector*scalar operations which GCC supports. A work around for Clang is to use Clang's OpenCL vector extensions. I don't know of a work around for ICC. MSVC does not support any kind of vector extension that I am aware of.

With GCC even though it supports vector + scalar and vector*scalar you cannot do vector = scalar (but you can with Clang's OpenCL vector extensions). Instead you can use this trick.

a = ((v4sf){} + 1)*3.14159f;

I would do as Paul R suggests and use intrinsics which are mostly compatible with the four major C/C++ compilers: GCC, Clang, ICC, and MSVC.

Here is a table of what is supported by each compiler using GCC's vector extensions and Clang's OpenCL vector extensions.

                                gcc  g++  clang  icc   OpenCL
unary operations                
[]                              yes  yes  yes    yes   yes
+, –                            yes  yes  yes    yes   yes
++, --                          yes  yes  no     no    no
~                               yes  yes  yes    yes   yes
!                               no   yes  no     no    yes 

binary vector op vector         
+,–,*,/,%                       yes  yes  yes    yes   yes    
&,|,^                           yes  yes  yes    yes   yes
>>,<<                           yes  yes  yes    yes   yes
==, !=, >, <, >=, <=            yes  yes  yes    yes   yes
&&, ||                          no   yes  no     no    yes

binary vector op scalar         
+,–,*,/,%                       yes  yes  no     no    yes
&,|,^                           yes  yes  no     no    yes
>>,<<                           yes  yes  no     no    yes
==, !=, >, <, >=, <=            yes  yes  no     no    yes                      
&&, ||                          no   yes  no     no    yes

assignment
vector = vector                 yes  yes  yes    yes   yes
vector = scalar                 no   no   no     no    yes                                              

ternary operator
?:                              no   yes  no     no    ?

We see that Clang and ICC do not support GCC's vector operator scalar operations. GCC in C++ mode supports everything but vector = scalar. Clang's OpenCL vector extensions support everything except maybe the ternary operator. Clang's documentation claims it does but I don't get it to work. GCC in C mode additional does not support binary logical operators or the ternary operator.

Z boson
  • 32,619
  • 11
  • 123
  • 226
  • 1
    Note that g++ supports more vector extensions than gcc (I should know, I implemented them ;-). I thought even gcc supported some of those "no" but I guess I as misremembering. – Marc Glisse May 09 '17 at 19:29
  • @MarcGlisse, I updated my table. I based it off of the [Clang's table](https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors) which is obviously wrong for GCC. I now tested tested many of the operations in the table. It's great to know that GCC supports everything that Clang's OpenCL vector extensions except for `vector = scalar` and `vector.xyzw` notation. Maybe it's time to fix `vector = scalar` in GCC. Why are the logical operators and ternary operators only supported in g++? – Z boson May 10 '17 at 09:05
  • 1
    Things only supported in C++: same reason why several vector extensions used to be only supported in C, the volunteer who wrote the code was more interested in one language than the other (the 2 front-ends are largely disjoint in gcc). – Marc Glisse May 10 '17 at 11:33
  • @MarcGlisse, would it be hard to implement the OpenCL swizzle notation v.xyzw as part of GCC? Do you think it's interesting? I asked a question about how to implement this with C++ https://stackoverflow.com/questions/19923882/custom-extended-vector-type-e-g-float4-b-v-xxyz – Z boson May 10 '17 at 12:16
  • @MarcGlisse, Clang does not suppor `?:` like they claim https://stackoverflow.com/questions/25345585/ternary-operator-for-clangs-extended-vectors and https://godbolt.org/g/rt67UM. – Z boson May 10 '17 at 13:28
  • I guess it wouldn't be too hard to handle .xyzw: hack something when the left hand side of . is a vector, then map it to __builtin_shuffle. I am not convinced it should get in, this is redundant with __builtin_shuffle; nicer, but for a costly operation that's not necessarily so good; limited to vectors of size <= 4 (or do you keep using the rest of the alphabet, then capital letters, then the Greek alphabet, etc?). But it would probably still be accepted. Again, your main issue is finding someone motivated to implement it. – Marc Glisse May 10 '17 at 13:45
  • @MarcGlisse, I think OpenCL handles up to 16 lanes with the notation. That's what I recall. Something like `v.s0123456789abcdef`. But I get your point. The main missing feature is the assignment `vector = scalar`. How do I get involved in implementing this? I'm not sure I have the right skills. – Z boson May 10 '17 at 13:57
  • If you want to start, I'd guess: build a debug compiler (I set CFLAGS, BOOT_CFLAGS, CFLAGS_FOR_TARGET and the CXXFLAGS variants to "-O0 -g" and configure with --disable-bootstrap), compile a failing program with `-wrapper gdb,--args` and in gdb use `source /path/to/mybuild/gcc/.gdbinit`, set a breakpoint on `error` and `error_at`, run, look at the backtrace to see where the error refusing to convert double to __m128d is coming from, look around in the code, for instance in convert_for_assignment you may notice special code for vectors. Grep for other places handling vectors to see what you can – Marc Glisse May 10 '17 at 15:02
  • ... do with them. Once you make progress, there is https://gcc.gnu.org/contribute.html explaining how to get it in the official sources. There is an IRC channel where you can ask questions if you are blocked (https://gcc.gnu.org/wiki/GCConIRC). – Marc Glisse May 10 '17 at 15:05
  • https://gcc.gnu.org/onlinedocs/gccint/ can also contain useful information on the internals of gcc, although I don't think it is sufficient. – Marc Glisse May 10 '17 at 15:11
  • I wish all Compiler vendors would sit together and create one standardized library. There are new winds in Microsoft so maybe we will see it in MSVC as well (Though wish they started with up to date OpenMP support). What about ICC 18.0? Does it add some support? – Royi Feb 21 '18 at 11:53