-2

I'm trying to learn to work with SIMD instructions in C. I decided to start working with SSE. I am using Windows 8.1 and am coding with Codeblocks in a Windows environment. My CodeBlocks settings are set to use an Intel i7 (which I have), so SSE is enabled. My program compiles just fine, but I get an error as soon as runtime hits. The error is "(0xC0000005)". Here is my code:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <xmmintrin.h>

float ScalarSSE(float *m1, float *m2) {

    float prod;
    int i;
    __m128 X, Y, Z;

    for(i=0; i<5; i+=4) {
        X = _mm_load_ps(&*m1);
        Y = _mm_load_ps(&*m2);
        X = _mm_mul_ps(X, Y);
        Z = _mm_add_ps(X, Z);
    }

    for(i=0; i<4; i++) {
        prod += Z[i];
    }

    return prod;

}

int main() {

    int i;
    float *s1 = calloc(1,sizeof (float));
    float *s2 = calloc(1,sizeof (float));

    for(i=0; i<100; i++) {
        *s1 = 2;
        *s2 = 2;
        float scalar_product_sse = ScalarSSE(s1, s2);
    }

    printf("Done");

    free (s1);
    free (s2);

}

I cannot use debug mode as I am not in a project nor know how to open up a project in CodeBlocks (it gives me errors :( )

I am wondering how to make this work without any errors. Thanks!

elemein
  • 197
  • 6
  • I cannot use debug mode as I am not in a project nor know how to open up a project in CodeBlocks (it gives me errors ) – elemein Oct 06 '14 at 16:31
  • `&*m1` is odd. Don't you mean `m1`. I see no evidence that you are aligning `s1` and `s2`. Don't you need to do so? – David Heffernan Oct 06 '14 at 16:46
  • 4
    Then perhaps you should learn how to use debug mode? Asking us to psychic debug your program when you haven't even learned the fundamentals of your toolkit is not a good way to proceed. – Max Oct 06 '14 at 16:47
  • And `_mm_load_ps` loads 4 single precision values, packed into a 128 bit value, 16 byte aligned. Do you know what `_mm_load_ps` does? Do you know what `_mm_mul_ps` does? Why are you not following the rules? – David Heffernan Oct 06 '14 at 16:51
  • @DavidHeffernan the code actually has so many issues that it's hard to understand what to begin with:) – Rudolfs Bundulis Oct 06 '14 at 16:52
  • To be honest I'm not really learning this from a class or anything, just trying to forage what I can online :/ I did do some research on the commands, but myself not being the most savvy in the world didnt know a lot of the implications that would entail using the commands. One might suggest to wait and get more experience before using instructions such as these, but it's a bit hard to get more experience when everything I try to do is for "the more experienced". Sort of a catch 22 thing. – elemein Oct 06 '14 at 16:58
  • @user3423509, I added a working example that I hope you can build upon to achieve what you wanted since the code provided has too many issues to fix it straightforward – Rudolfs Bundulis Oct 06 '14 at 17:04

2 Answers2

2

Here you are allocating space for one float:

float *s1 = calloc(1,sizeof (float));

You are then accessing more than one float in ScalarSSE

Maybe something like this will help, although determining the exact number to use would be better:

float *s1 = calloc(100,sizeof (float));
float *s2 = calloc(100,sizeof (float));

Additionally:

for(i=0; i<100; i++) {
    *s1 = 2;
    *s2 = 2;
    float scalar_product_sse = ScalarSSE(s1, s2);
}

This does the same thing 100 times - is that what you intend?

"(0xC0000005)" means 'Access violation' -- basically you have tried to access memory you are not allowed to touch (or doesn't exist)

Rob
  • 3,315
  • 1
  • 24
  • 35
  • Thanks for helping Rob, I made the suggested changes and am still getting the same error unfortunately. And yes I know I am doing the same thing 100 times, this is what I intended to do as I want the calculation to be done more than once. – elemein Oct 06 '14 at 16:33
  • @user3423509, the issue is how you access the float pointers in the ScalarSSE() function, what is the first for loop intended to do? – Rudolfs Bundulis Oct 06 '14 at 16:34
  • If you mean the one located near the top of the program, it's simply intended to load the two numbers and perform calculations on them. I made a little change to it that another user suggested that was clearly correct but still get the same error. – elemein Oct 06 '14 at 16:39
  • Your change to `&*m1` is "correct" only in the sense that it isn't a compile-time error. Now you have a loop that never uses the loop variable - that doesn't make any sense. – nobody Oct 06 '14 at 16:56
2

The code had so many issues that I'll just state some of them, that most likely were causing the access violation and provide a working example, and then you can try to modify it as for your needs, since it is really hard to understand what you are trying to achieve.

1) Memory alignment - _mm_load_ps takes a 16 byte aligned address, you must use a proper memory allocation function where you can specify the alignment, like _aligned_malloc or wrapping the floats in some kind of aligned struct like in the example

2) You are using uninitialized value of Z and prod so results after the additions would be undefined.

3) as mentioned in the comments _mm_load_ps needs a vector of 4 floats, so allocating 1 float as you are doing is invalid, if you actually wanted to load only one value see _mm_load1_ps, that loads one float into all four values of __m128 type, but it is not possible to understand what the intial idea was because of the for loop.

Here is an example, based on your code and without the strange for loops, that works, I hope it is enough to build upon to achieve what you wanted:

#include <xmmintrin.h>
#include <stdio.h>
#include <malloc.h>

// This ensures the proper alignment
__declspec(align(16)) struct float_vector
{
    float a, b, c, d;
};

float ScalarSSE(float *m1, float *m2) {

    float prod = 0;
    int i;
    __m128 X, Y, Z;

    X = _mm_load_ps(m1); // This loads 4 values, so the pointer should point to an array of at least 4 floats
    Y = _mm_load_ps(m2); // This loads 4 values, so the pointer should point to an array of at least 4 floats
    Z = _mm_mul_ps(X, Y);

    for(i=0; i<4; i++) {
        prod += (reinterpret_cast<float*>(&Z))[i];
    }

    return prod;

}

int main() {

    float_vector s1 = { 2.0, 2.0, 2.0, 2.0 };
    float_vector s2 = { 2.0, 2.0, 2.0, 2.0 };

    float scalar_product_sse = ScalarSSE(&s1.a, &s2.a);
    printf("Done, result: %f\n", scalar_product_sse);

    return 0;
}

And it actually gives a valid result - (2*2) + (2*2) + (2*2) + (2*2) = 16.

Rudolfs Bundulis
  • 11,636
  • 6
  • 33
  • 71
  • For the purposes of an example, you could eliminate the `for` loop in `main` and also print the result. – nobody Oct 06 '14 at 17:07
  • It definitely would work for my intended purposes (as the main purpose was to just write working code that would show me what would be needed to work with SIMD instructions. Just an exercise is all), though my particular IDE does not seem to want to build the program successfully for a few errors. The main glob of errors pop up on line 22 and 36, and I don't really know what would be the most elegant way to post the errors, so I'll just try to sift through them myself as you've all done so much already :) Thank you guys – elemein Oct 06 '14 at 17:20
  • @user3423509 what IDE and compiler are you using? This example worked fine on Visual Studio 2010, the initial code did not compile since you can't just access the elements of _m128, http://stackoverflow.com/questions/12624466/get-member-of-m128-by-index has other nicer ways to do that intead of the ugly cast. – Rudolfs Bundulis Oct 06 '14 at 17:24
  • Sorry for the late reply. I am using the latest version of CodeBlocks with what I believe is the GCC compiler. – elemein Oct 06 '14 at 20:51
  • @user3423509 I asked because your initial code had the `prod += Z[i]` part which didn't compile under Visual Studio and it makes sense, weird that GCC compiles it. – Rudolfs Bundulis Oct 07 '14 at 09:19
  • Oh yeah it has no problem compiling that. Im just still trying to figure out how to change your code to be compilable on my compiler. I cant seem to change it to work as almost all the type names are different. – elemein Oct 07 '14 at 13:22
  • What errors are you getting? Why don't you try Visual Studio Express editions? That should be easier to use than CodeBlocks. If you can upload the console log to anywhere to see the errors I could try to advise something. – Rudolfs Bundulis Oct 07 '14 at 14:59