-1

I am testing FFTW in a fortran program, because I need to use it. Since I am working with huge matrixes, my first solution is to use OpenMP. When my matrix has dimension 500 x 500 x 500, the following error happens:

Operating system error: 
Program aborted. Backtrace:
Cannot allocate memory
Allocation would exceed memory limit

I compiled the code using the following: gfortran -o test teste_fftw_openmp.f90 -I/usr/local/include -L/usr/lib/x86_64-linux-gnu -lfftw3_omp -lfftw3 -lm -fopenmp

PROGRAM test_fftw
USE omp_lib      
USE, intrinsic:: iso_c_binding
IMPLICIT NONE
INCLUDE 'fftw3.f'
INTEGER::i, DD=500
DOUBLE COMPLEX:: OUTPUT_FFTW(3,3,3) 
DOUBLE COMPLEX, ALLOCATABLE:: A3D(:,:,:), FINAL_OUTPUT(:,:,:)
integer*8:: plan
integer::iret, nthreads
INTEGER:: indiceX, indiceY, indiceZ, window=2

!! TESTING 3D FFTW with OPENMP
ALLOCATE(A3D(DD,DD,DD))
ALLOCATE(FINAL_OUTPUT(DD-2,DD-2,DD-2))
write(*,*) '---------------'
write(*,*) '------------TEST 3D FFTW WITH OPENMP----------'
A3D = reshape((/(i, i=1,DD*DD*DD)/),shape(A3D))

CALL dfftw_init_threads(iret)
CALL dfftw_plan_with_nthreads(nthreads)

CALL dfftw_plan_dft_3d(plan, 3,3,3, OUTPUT_FFTW, OUTPUT_FFTW, FFTW_FORWARD, FFTW_ESTIMATE)
FINAL_OUTPUT=0.
!$OMP PARALLEL DO DEFAULT(SHARED) SHARED(A3D,plan,window) &
!$OMP PRIVATE(indiceX, indiceY, indiceZ, OUTPUT_FFTW, FINAL_OUTPUT)
DO indiceZ=1,10!500-window
    write(*,*) 'INDICE Z=', indiceZ
    DO indiceY=1,10!500-window
        DO indiceX=1,10!500-window
            CALL dfftw_execute_dft(plan, A3D(indiceX:indiceX+window,indiceY:indiceY+window, indiceZ:indiceZ+window), OUTPUT_FFTW)
            FINAL_OUTPUT(indiceX,indiceY,indiceZ)=SUM(ABS(OUTPUT_FFTW))
        ENDDO    
    ENDDO    
ENDDO
!$OMP END PARALLEL DO
call dfftw_destroy_plan(plan)
CALL dfftw_cleanup_threads()
DEALLOCATE(A3D,FINAL_OUTPUT)
END PROGRAM test_fftw

Notice this error occurs when I just use a huge matrix(A3D) without running the loop in all the values of this matrix (for running in all values, I should have the limits of the three (nested) loops as 500-window. I tried to solve this(tips here and here) with -mcmodel=medium in the compilation without success. I had success when I compiled with gfortran -o test teste_fftw_openmp.f90 -I/usr/local/include -L/usr/lib/x86_64-linux-gnu -lfftw3_omp -lfftw3 -lm -fopenmp -fmax-stack-var-size=65536

So, I don't understand: 1) Why there is memory allocation problem, if the huge matrix is a shared variable? 2) The solution I found is going to work if I have more huge matrix variables? For example, 3 more matrixes 500 x 500 x 500 to store calculation results. 3) In the tips I found, people said that using allocatable arrays/matrixes would solve, but I was using without any difference. Is there anything else I need to do for this?

victortxa
  • 79
  • 10
  • Do you use a 64 bit compiler and OS? How much memory do you have? About 4 GB of memory are required. Plus another possible temporary for the `reshape`, so another possible 2 GB. – Vladimir F Героям слава May 22 '17 at 21:42
  • 1
    Check the process resource limits with `ulimit -a`. Perhaps the data segment size or the virtual address space size is limited. – Hristo Iliev May 22 '17 at 22:32
  • I tested in a limited PC I have, with 4GB of RAM. I'm going to test in one with 12 GB asap (update here when done). Regarding the `ulimit -a`, the `stack size` is `8192 kb`. Sorry @hristo-iliev, but what exactly is this 8 Mb limit? – victortxa May 23 '17 at 13:21
  • @victortxa Hristo was not talking about `stack size` but *"data segment size or the virtual address space size"*, which is something very different. For stack see explanation in https://stackoverflow.com/questions/13264274/why-segmentation-fault-is-happening-in-this-openmp-code and https://stackoverflow.com/questions/20256523/how-to-set-openmp-thread-stack-to-unlimited but I don't think that is your problem. You simply need more RAM in your computer. – Vladimir F Героям слава May 23 '17 at 13:44
  • @vladimir-f now I got! This features you clarified are both unlimited according to the `ulimit -a` output. – victortxa May 23 '17 at 14:02
  • There is a setting on Linux that controls how much the amount of memory the programs ask for can exceed the actually available physical memory. It is called an overcommit ratio and allows the kernel to promise the applications more memory than available in hope that they won't use it all at the same time. With certain kernel settings (`cat /proc/sys/vm/overcommit_memory` to find out), it is possible to tell the kernel to deny allocations that are greater than certain value. With other settings, the kernel will always satisfy the request and let the program crash later when RAM+swap gets full. – Hristo Iliev May 23 '17 at 15:28

1 Answers1

0

Two double complex arrays with 500 x 500 x 500 elements require 4 gigabytes of memory. It is likely that the amount of available memory in your computer is not sufficient.

If you only work with small windows, you might consider not using the whole array at the whole time, but only parts of it. Or distribute the computation across multiple computers using MPI.

Or just use a computer with bigger RAM.

  • Maybe I am miss something but isn't `500 x 500 x 500 x 2(complex) x 8(double) x 2(num arrays) = 4 e9` that is 4GB or 3.73GB? (depending if you choose to calculate with 1000 or 1024) – BlameTheBits May 22 '17 at 21:28
  • Actually, more than 3.73 GiB are needed as a non-contiguous slice of `A3D` is passed to a function with no explicit interface and the compiler is creating a temporary contiguous copy for each call to `dfftw_execute_dft`. – Hristo Iliev May 23 '17 at 15:51
  • Ok, `window` is 2, so temporary arrays are no big deal. – Hristo Iliev May 23 '17 at 15:57
  • @vladimir-f It has worked fine in the PC with 12 GB of RAM(without the `-fmax-stack-var-size=65536` flag in the compillation). Now my problem is another related with OpenMP calculation, I'm posting in another issue. – victortxa May 23 '17 at 23:55