0

I'm learning the basics of SIMD so I was given a simple code snippet to see the principle at work with SSE and SSE2.

I recently installed minGW to compile C code in windows with gcc instead of using the visual studio compiler.

The objective of the example is to add two floats and then multiply by a third one.

The headers included are the following (which I guess are used to be able to use the SSE intrinsics):

#include <time.h>
#include <stdio.h>
#include <xmmintrin.h>
#include <pmmintrin.h>
#include <time.h>
#include <sys/time.h> // for timing

Then I have a function to check what time it is, to compare time between calculations:

double now(){
   struct timeval t; double f_t;
   gettimeofday(&t, NULL);
   f_t = t.tv_usec; f_t = f_t/1000000.0; f_t +=t.tv_sec;
   return f_t;
}

The function to do the calculation in the "scalar" sense is the following:

void run_scalar(){
  unsigned int i;
  for( i = 0; i < N; i++ ){
     rs[i] = (a[i]+b[i])*c[i];
  }   
}

Here is the code for the sse2 function:

void run_sse2(){
  unsigned int i;
  __m128 *mm_a = (__m128 *)a; 
  __m128 *mm_b = (__m128 *)b;
  __m128 *mm_c = (__m128 *)c;
  __m128 *mm_r = (__m128 *)rv;
  for( i = 0; i <N/4; i++)
    mm_r[i] = _mm_mul_ps(_mm_add_ps(mm_a[i],mm_b[i]),mm_c[i]);
  }

The vectors are defined the following way (N is the size of the vectors and it is defined elsewhere) and a function init() is called to initialize them:

float a[N] __attribute__((aligned(16)));
float b[N] __attribute__((aligned(16)));
float c[N] __attribute__((aligned(16)));
float rs[N] __attribute__((aligned(16)));
float rv[N] __attribute__((aligned(16)));

void init(){
  unsigned int i;
  for( i = 0; i < N; i++ ){
      a[i] = (float)rand () / RAND_MAX / N; 
      b[i] = (float)rand () / RAND_MAX / N;  
      c[i] = (float)rand () / RAND_MAX / N; 
  }
}

Finally here is the main that calls the functions and prints the results and computing time.

int main(){
  double t;
  init();
  t = now();
  run_scalar();
  t = now()-t;
  printf("S = %10.9f Temps du code scalaire   : %f seconde(s)\n",1e5*sum(rs),t);
  t = now();
  run_sse2();
  t = now()-t;
  printf("S = %10.9f Temps du code vectoriel 2: %f seconde(s)\n",1e5*sum(rv),t);
}

For sum reason if I compile this code with a command line of "gcc -o vec vectorial.c -msse -msse2 -msse3" or "mingw32-gcc -o vec vectorial.c -msse -msse2 -msse3"" it compiles without any problems, but for some reason I can't run it in my windows machine, in the command prompt I get an "access denied" and a big message appears on the screen saying "This app can't run on your PC, to find a version for your PC, check with the software publisher".

I don't really understand what is going on, neither do I have much experience with MinGW or C (just an introductory course to C++ done on Linux machines). I've tried playing around with different headers because I thought maybe I was targeting a different processor than the one on my PC but couldn't solve the issue. Most of the info I found was confusing.

Can someone help me understand what is going on? Is it a problem in the minGW configuration that is compiling in targeting a Linux platform? Is it something in the code that doesn't have the equivalent in windows?

I'm trying to run it on a 64 bit Windows 8.1 pc

Edit: Tried the configuration suggested in the site linked below. The output remains the same.

If I try to run through MSYS I get a "Bad File number" If I try to run throught the command prompt I get Access is Denied.

I'm guessing there's some sort of bug arising from permissions. Tried turning off the antivirus and User Account control but still no luck.

Any ideas?

  • Possibly your cross-compiler targeted x64 but you're trying to run it on 32-bit Windows? – nobody Dec 11 '14 at 21:52
  • Can you run a simple "hello world" program compiled with your GCC? – Eugene Sh. Dec 11 '14 at 21:55
  • Yes, I did and it runs fine :/ – José Miguel Arroyo Dec 11 '14 at 22:20
  • main() will not compile cleanly. Because the function has a int return expected, but the code is missing the expected 'return(0);' just before the last closing brace. I also do not see the prototypes for the sub functions, which should be listed just before the main function. (sub functions should be listed after the main function) You need to add the -Wall parameter right after the 'gcc', so all the warnings show up. – user3629249 Dec 12 '14 at 06:38
  • I notice that is listed twice. This will be 'ok' because the header files have proper wrappers so their contents cannot be included more that once in a single compilation unit. – user3629249 Dec 12 '14 at 06:42
  • for the compiles, did you add the necessary parameters for the library path(s) and the library names? – user3629249 Dec 12 '14 at 06:44
  • doesn't windows like executable program names to end in .exe or .com? – user3629249 Dec 12 '14 at 06:49
  • Where is the complete listing? What is N, a, b, c ? – vitalyster Dec 12 '14 at 07:11
  • N is defined elsewhere it's just an int, and a, b and c are the vectors of floats. I think there something going on with the sse settings and stuff but I don't really understand it – José Miguel Arroyo Dec 12 '14 at 16:01
  • @user3629249 The `main` function is special-cased in the standard; allowed to omit the `return`, in which case [it acts as if you did a `return 0`](http://stackoverflow.com/questions/13545291/can-i-omit-return-from-main-in-c). – Raymond Chen Dec 15 '14 at 22:31

1 Answers1

1

There is nothing wrong with your code, besides, you did not provide the definition of sum() or N which is, however, not a problem. The switches -msse -msse2 appear to be not required.

I was able to compile and run your code on Linux (Ubuntu x86_64, compiled with gcc 4.8.2 and 4.6.3, on Atom D2700 and AMD Athlon LE-1640) and Windows7/64 (compiled with gcc 4.5.3 (32bit) and 4.8.2 (64bit), on Core i3-4330 and Core i7-4960X). It was running without problem.

Are you sure your CPU supports the required instructions? What exactly was the error code you got? Which MinGW configuration did you use? Out of curiosity, I used the one available at http://win-builds.org/download.html which was very straight-forward.

However, using the optimization flag -O3 created the best result -- with the scalar loop! Also useful are -m64 -mtune=native -s.

Twonky
  • 796
  • 13
  • 31
  • About the last paragraph: at `-O3` automatic vectorization is enabled, so most probably the "scalar" version is compiled to something that isn't scalar at all :) – Matteo Italia Dec 15 '14 at 01:50
  • MY CPU is i7-4860HQ. I'm guessing my minGW is not configured to target my CPU or something like that, I'm not familiar at all with MinGW so I guess that's where the problem comes from. I have the 4.8.1 gcc version, I just downleaded MinGW from their website and followed the instructions for a basic setup. I tried using the one you suggested following the installation instructions in the website but no luck. If I run the compiler through Msys and try to run the exe I get, compilation goes on with no errors or warnings but when I try to run the exe I get ""Bad file number" – José Miguel Arroyo Dec 15 '14 at 10:22
  • Do you have any recommendations about where to learn how to use MinGW and gcc? I really feel like I have no idea about what I'm doing and the MinGW website isn't really clear or newbie-friendly. BTW: Thanks for the answer – José Miguel Arroyo Dec 15 '14 at 10:24