There can be many reasons for such a performance loss :
1) Static arrays are always allocated on the BSS (see Where are static variables stored (in C/C++)?), whereas "allocated" arrays can be allocated on the heap or on the stack. Allocation on the stack is much faster than on the heap. A good compiler can generate code that will allocate as much as possible on the stack.
2) You may have allocate/deallocate statements in loops. Every memory allocation will take some time. A good compiler can avoid allocating physically some memory at every allocation, but instead re-use space that has been deallocated.
3) The compiler knows dimensions at compile time with static arrays, so it will do some additional optimizations.
4) If you have multi-dimensional arrays, the calculation of the address of the elements can't be done at compile time. For example, the address of A(5,6,7)
is 5 + 6*n1 + 7*n1*n2
where n1
and n2
are the dimensions of A
: A(n1,n2,n3)
. For static arrays, the compiler can optimize this part. Moreover, if dimension n1,n2,...
is a power of 2, instead of doing an integer multiply the compiler will generate a bit shift which is 3x faster.
Number 3) is the most probable. You can leave some static arrays for arrays for which you know a reasonable upper-bound at compile time, and which are relatively small (<1000 elements roughly) and also inside routines that are called very often and which do a very small amount of work.
As a rule of thumb, only small arrays can be statically allocated : most of the 1D arrays, some small 2D arrays and tiny 3D arrays. Convert all the rest to dynamic allocation as they will probably not be able to fit in the stack.
If you have some frequent allocate/deallocates because you call a subroutine in a loop such as this:
do i=1,10000000
call work(a,b)
end do
subroutine work(a,b)
...
allocate (c)
...
deallocate (c)
end
if c
has always the same dimensions you can put it as an argument of the subroutine, or as a global variable that will be allocated only one before calling work:
use module_where_c_is_defined
allocate (c)
do i=1,10000000
call work(a,b)
end do
deallocate(c)
subroutine work(a,b)
use module_where_c_is_defined
if (.not.allocated(c)) then
stop 'c is not allocated'
endif
...
end