I have written a scientific code and, as usual, this boils to calculating the coefficients in an algebraic Eigenvalue equation: Calculating these coefficients requires integrating over multi-dimensional arrays and this quickly inflates memory usage drastically. Once the matrix coefficients are calculated, the original, pre-integation multi-dimensional arrays can be deallocated and intelligent solvers take over, so memory usage ceases to be the big issue. As you can see there is a bottleneck and on my 64 bit, 4 core, 8 threads, 8GB ram laptop the program crashes due to insufficient memory.
I am therefore implementing a system that keeps memory usage in check by limiting the size of the tasks that the MPI processes can take on when calculating some of the Eigenvalue matrix elements. When finished they will then look for remaining jobs to be done so the matrix still gets filled but in a more sequential and less parallel way.
I was therefore checking how much memory I can allocate and here is where the confusion starts: I allocate doubles with size 8 bytes (checked using sizeof(1)
) and look at the allocation status.
Though I have 8 GB of ram available running a test with only 1 process, I can allocate an array of up to size (40000,40000)
, which corresponds to about 13GB of memory! My first question is thus: How is this possible? Is there so much virtual memory?
Secondly, I realized that I can also do the same thing for multiple processes: Up to 16 processes can, simultaneously allocate these massive arrays!
This cannot be right?
Does somebody understand why this happens? And whether I am doing something wrong?
Edit:
Here is a code that produces the aforementioned miracle, at least on my machine. However, when I set the elements of the arrays to some value it indeed behaves as it should and crashes--or at least starts behaving very slowly, which I guess is due to the fact that slow virtual memory is used?
program test_miracle
use ISO_FORTRAN_ENV
use MPI
implicit none
! global variables
integer, parameter :: dp = REAL64 ! double precision
integer, parameter :: max_str_ln = 120 ! maximum length of filenames
integer :: ierr ! error variable
integer :: n_procs ! MPI nr. of procs
! start MPI
call MPI_init(ierr) ! initialize MPI
call MPI_Comm_size(MPI_Comm_world,n_procs,ierr) ! nr. MPI processes
write(*,*) 'RUNNING MPI WITH', n_procs, 'processes'
! call asking for 6 GB
call test_max_memory(6000._dp)
call MPI_Barrier(MPI_Comm_world,ierr)
! call asking for 13 GB
call test_max_memory(13000._dp)
call MPI_Barrier(MPI_Comm_world,ierr)
! call asking for 14 GB
call test_max_memory(14000._dp)
call MPI_Barrier(MPI_Comm_world,ierr)
! stop MPI
call MPI_finalize(ierr)
contains
! test whether maximum memory feasible
subroutine test_max_memory(max_mem_per_proc)
! input/output
real(dp), intent(in) :: max_mem_per_proc ! maximum memory per process
! local variables
character(len=max_str_ln) :: err_msg ! error message
integer :: n_max ! maximum size of array
real(dp), allocatable :: max_mem_arr(:,:) ! array with maximum size
integer :: ierr ! error variable
write(*,*) ' > Testing whether maximum memory per process of ',&
&max_mem_per_proc/1000, 'GB is possible'
n_max = ceiling(sqrt(max_mem_per_proc/(sizeof(1._dp)*1.E-6)))
write(*,*) ' * Allocating doubles array of size', n_max
allocate(max_mem_arr(n_max,n_max),STAT=ierr)
err_msg = ' * cannot allocate this much memory. Try setting &
&"max_mem_per_proc" lower'
if (ierr.ne.0) then
write(*,*) err_msg
stop
end if
!max_mem_arr = 0._dp ! UNCOMMENT TO MAKE MIRACLE DISSAPEAR
deallocate(max_mem_arr)
write(*,*) ' * Maximum memory allocatable'
end subroutine test_max_memory
end program test_miracle
To be saved in test.f90
and subsequently compiled and run with
mpif90 test.f90 -o test && mpirun -np 2 ./test