0

I have the code of matrices multiplication with using openmp:

#include <stdio.h>
#include <omp.h>
#include <math.h>
#define N 1000
int main()
{
    long int i, j, k;
    //long int N = atoi(argv[1]);
    double t1, t2;
    double a[N][N],b[N][N],c[N][N]; 

    for (i=0; i<N; i++)
        for (j=0; j<N; j++)
            a[i][j]=b[i][j]=log(i*j/(i*j+1.)+1) +exp(-(i+j)*(i+j+1.));

    t1=omp_get_wtime();

    #pragma omp parallel for shared(a, b, c) private(i, j, k)
    for(i=0; i<N; i++){
        for(j=0; j<N; j++){
            c[i][j] = 0.0;
            for(k=0; k<N; k++) c[i][j]+=a[i][k]*b[k][j];
        }
    }

    t2=omp_get_wtime();
    printf("Time=%lf\n", t2-t1);
}

Now I want to set the number of threads which I want through command line. I do that by using

atoi(argv[])

Namely

#include <stdio.h>
#include <omp.h>
#include <math.h>
#define N 1000
int main(int argc, char** argv[])
{
    long int i, j, k;
    //long int N = atoi(argv[1]);
    double t1, t2;
    double a[N][N],b[N][N],c[N][N];

    for (i=0; i<N; i++)
        for (j=0; j<N; j++)
            a[i][j]=b[i][j]=log(i*j/(i*j+1.)+1) +exp(-(i+j)*(i+j+1.));

    int t = atoi(argv[1]);
    t1=omp_get_wtime();

    #pragma omp parallel for shared(a, b, c) private(i, j, k) num_threads(t)
    for(i=0; i<N; i++){
        for(j=0; j<N; j++){
            c[i][j] = 0.0;
            for(k=0; k<N; k++) c[i][j]+=a[i][k]*b[k][j];
        }
    }

    t2=omp_get_wtime();
    printf("Time=%lf\n", t2-t1);
}

Everything is fine, except one crucial thing: when I try to compute the product of matrices with dimension more than (more or less) 500, I get the mistake: "segmentation fault". Could someone clarify the reason for this mistake?

jb326
  • 1,345
  • 2
  • 14
  • 26
John Taylor
  • 137
  • 1
  • 9
  • 3
    That is some three star code right there ;) – David van rijn May 09 '16 at 21:33
  • Unfortunately, my n star code doesn't work. – John Taylor May 10 '16 at 03:33
  • 1
    Not sure if you've gathered yet ... your `main` signature is wrong. Change `char** argv[]` to `char* argv[]` or `char** argv`. See http://stackoverflow.com/questions/2108192/what-are-the-valid-signatures-for-cs-main-function – yano May 10 '16 at 04:23
  • @yano : Thank You, but I've already done that, and, unfortunately, nothing is changed. – John Taylor May 10 '16 at 04:43
  • So where is it seg faulting? Somewhere in the triple loop after `#pragma omp parallel`? And just to be clear, you intend your `N` value to be the matrix dimensions, correct? In your code it's defined to 1000, but in your question you say when it's set to ~500 you start seg faulting? – yano May 10 '16 at 04:57
  • How about you employ the built-in mechanism for setting the number of OpenMP threads and just set the `OMP_NUM_THREADS` environment variable accordingly? – Hristo Iliev May 10 '16 at 06:20

1 Answers1

1

I don't know anything about openmp, but you are most assuredly blowing up your stack. Default stack space will vary from system to system, but with N == 1000, you are trying to put three 2D arrays totaling 3 million doubles on the stack. Assuming a double is 8 bytes, that's 24 million bytes, or just shy of 22.9MB. There can't be many systems allowing that kind of stack space. Instead, I'd recommend trying to grab that amount of memory from the heap. Something like this:

//double a[N][N],b[N][N],c[N][N];
double **a, **b, **c;
a = malloc(sizeof(double*) * N);
b = malloc(sizeof(double*) * N);
c = malloc(sizeof(double*) * N);

for (i=0; i<N; i++)
{
  a[i] = malloc(sizeof(double) * N);
  b[i] = malloc(sizeof(double) * N);
  c[i] = malloc(sizeof(double) * N);
}

// do your calculations

for (i=0; i<N; i++)
{
  free(a[i]);
  free(b[i]);
  free(c[i]);
}
free(a);
free(b);
free(c);

I've verified on my machine at least, that with N == 1000 I crash right out of the gate with EXC_BAD_ACCESS when trying to place those arrays on the stack. When I dynamically allocate the memory instead as shown above, I get no seg faults.

yano
  • 4,827
  • 2
  • 23
  • 35
  • Actually, it's only your OS X that cannot handle very large stacks. The stack on x86 can grow "infinitely" until it hits another mapping. Any decent Unix-like system (*BSD, Linux, Solaris, etc.) exposes this by allowing the `RLIMIT_STACK` resource limit to be set to `RLIM_INIFINITY`. The shell equivalent is `ulimit -s unlimited`. The stack size on OS X has an upper limit of 64 MiB and dynamic allocation is the only option. Even Windows allows for large (but of fixed size) stack, though the size there is encoded in the header of the executable file itself. – Hristo Iliev May 10 '16 at 07:02
  • @HristoIliev Interesting.. I knew you could use `ulimit -s` to set the stack size, but I was unaware of an `unlimited` option,, I'll have to look more into that. I think you should write this as an answer. Even so, I'm willing to bet he has his stack size set to some finite value, maybe ~6MB (I will never adopt the IEC nomenclature, heh) considering he starts to blow up around `N==500`. How did you know I was on OS X? `EXC_BAD_ACCESS`? That came from `gdb`. If it was my stack size (it's @ 8MB),, I would expect this code to blow up my linux box too with where stack size == 8MB. – yano May 11 '16 at 05:42
  • I've actually written an answer [here](http://stackoverflow.com/a/13266595/1374437), but the question - although very similar - is about Fortran. As for the OS, `EXC_BAD_ACCESS` is one of the Mach exceptions defined by the Apple kernels. – Hristo Iliev May 11 '16 at 07:20