0

I need to use the dsyev function from LAPACK on a large real symmetric 2D matrix. I can use it from scipy.linalg.lapack, but it doesn't finish computing in reasonable time.

I can benefit from > 20 cores on a HPC cluster which has intel 2020 suite as a module. Because reasons, I cannot just use the same python code as the one from my laptop inside on the cluster and set the environmental variable MKL_NUM_THREADS=32 for example. My only way forward is using C and Intel MKL for C.

I studied the intel MKL example on how to use the dsyev function in C.

My 2D array in python is shape (20160, 20160) of np.float64. 20160 = 252 * 80. I just write T = np.zeros((20160, 20160), dtype=np.float64), then proceed to filling it according to my task. At the end I reshape it using np.flatten(order='C'), then save it to disk in either a .npy or a .txt format.

Here I ask for your help:

I am trying to read it from memory using C. My elementary program works for small 1D array sizes, example (20000,) of C doubles, but fails when actually trying to read the whole 1D array, to be then used with dsyev. According to google: 250 * 80 * 250 * (64 bytes) = 0.32 gigabytes and I do not know why I cannot use in my C program a 1D array of that size, for example.

My code in C:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int main() {
    double arrayy[252*80*250];
    // double *arrayy = (double*) malloc(250*80*100*sizeof(double));
    printf("Hehe");
    FILE *fp;

    fp = fopen("1D_array.txt", "rb");
    for ( int i = 0; i < 250*80 ; i++) {
        // fread(&(array[i]), sizeof(array[i]), 1, fp);
        fscanf(fp, "%lf", &arrayy[i]);
    }
    // fread(&(arrayy[0]), sizeof(double), 252*80*3, fp);
    fclose(fp);

    for ( int i = 0 ; i < 252*80 ; i++ ) {
        if (i % 1000 == 0)
            printf("Value at index %d is %f\n", i, arrayy[i]);
  }
}

I am using gcc 9.3.0.17 inside a Windows Subsytem for Linux 1 (WSL1) on a Windows 11 with 16 GB of RAM.

Thank you!

UPDATE:

This, when compiled with $ gcc -Wall test.c and then $./a.out returns $ Segmentation fault (core dumped) and nothing else gets printed!

velenos14
  • 534
  • 4
  • 13
  • 2
    Your problem is probably the stack size. Local variables are allocated from the stack which is a memory area with some specific maximum size. Try making it global instead (if that works for you) or else use malloc – user253751 Oct 14 '22 at 15:12
  • @user253751, thank you for your help. I tried to use malloc as in: ``double *arrayy = (double*) malloc(250*80*100*sizeof(double));``, but ``./a.out`` results in ``Segmentation fault (core dumped)``. I will try your other idea, but I do not know what you exactly mean by making it global. Putting ``double arrayy[252*80*100];`` above ``int main()``, the same segmentation fault appears. – velenos14 Oct 14 '22 at 15:13
  • 2
    Check the malloc result to see if it actually allocated anything (check arrayy for NULL) – Nelfeal Oct 14 '22 at 15:17
  • 2
    "Even for very small arrays allocated" - are you trying to access it like before? If `i` goes to `250*80` and you do `arrayy[i]` but `arrayy` has only 252 elements, then of course you get a segfault. – Nelfeal Oct 14 '22 at 15:57
  • Can any of you make an answer as now it works with ``malloc`` with 250*80*250*80 elements of type double, and the instruction is written inside main()? Thank you! – velenos14 Oct 14 '22 at 18:35

1 Answers1

2

In C, arrays are allocated on the stack just like any other local variable, which is a relatively small area in memory compared to the heap.

This line double arrayy[252*80*250]; probably leads to a stack overflow which appears as a segmentation fault. To keep in line with your Python code, you should dynamically allocate your array, like this:

double *arrayy = malloc(250*80*100*sizeof(*arrayy));

Note that you don't need the cast and that you can do sizeof(*arrayy) instead of sizeof(double).

Also note that in your code, you seem to alternate between 252 and 250 for no reason. You should probably just name the size of your array into a variable, like this:

const unsigned long arrayy_size = 252 * 80 * 252 * 80;
double *arrayy = malloc(arrayy_size * sizeof(*arrayy));
// [...]
for (unsigned long i = 0; i < arrayy_size; ++i) {
    // use arrayy[i]
}
Nelfeal
  • 12,593
  • 1
  • 20
  • 39