Does the order or syntax of allocate statement affect performance? (Fortran)

Question

Because of having performance issues when passing a code from static to dynamic allocation, I started to wander about how memory allocation is managed in a Fortran code.

Specifically, in this question, I wander if the order or syntax used for the allocate statement makes any difference. That is, does it make any difference to allocate vectors like:

allocate(x(DIM),y(DIM))

versus

allocate(x(DIM))
allocate(y(DIM))

The syntax suggests that in the first case the program would allocate all the space for the vectors at once, possibly improving the performance, while in the second case it must allocate the space for one vector at a time, in such a way that they could end up far from each other. If not, that is, if the syntax does not make any difference, I wander if there is a way to control that allocation (for instance, allocating a vector for all space and using pointers to address the space allocated as multiple variables).

Finally, I notice now that I don't even know one thing: an allocate statement guarantees that at least a single vector occupies a contiguous space in memory (or the best it can?).

Thank you. I have tried to formulate more precisely my question. I do not "think" one think or the other, I wander if there is such a difference and if that makes any difference for performance. — leandro, Mar 11 '16 at 16:56
This answer -- http://stackoverflow.com/questions/13308684/increase-of-virtual-memory-without-increse-of-vmsize/13309395#13309395 -- and others will teach you that, on Linux systems at least, `allocate` is unlikely to cause memory to be allocated, it is interpreted more as a notice to the o/s to get ready to provide more memory. Then when you write to the freshly-allocated memory things actually start to happen. How this affects the performance of your code, I haven't clue. But I'd be interested to read your reports of your experiments to find out. — High Performance Mark, Mar 11 '16 at 17:06
Also http://stackoverflow.com/questions/11335108/fortran-allocate-deallocate, — High Performance Mark, Mar 11 '16 at 17:11

Vladimir F Героям слава · Answer 1 · 2016-03-11T18:10:38.543

From the language standard point of view both ways how to write them are possible. The compiler is free to allocate the arrays where it wants. It normally calls malloc() to allocate some piece of memory and makes the allocatable arrays from that piece.

Whether it might allocate a single piece of memory for two different arrays in a single allocate statement is up to the compiler, but I haven't heard about any compiler doing that.

I just verified that my gfortran just calls __builtin_malloc two times in this case.

Another issue is already pointed out by High Performance Mark. Even when malloc() successfully returns, the actual memory pages might still not be assigned. On Linux that happens when you first access the array.

I don't think it is too important if those arrays are close to each other in memory or not anyway. The CPU can cache arrays from different regions of address space if it needs them.

Is there a way how to control the allocation? Yes, you can overload the malloc by your own allocator which does some clever things. It may be used to have always memory aligned to 32-bytes or similar purposes (example). Whether you will improve performance of your code by allocating things somehow close to each other is questionable, but you can have a try. (Of course this is completely compiler-dependent thing, a compiler doesn't have to use malloc() at all, but mostly they do.) Unfortunately, this will only works when the calls to malloc are not inlined.

Ed Smith · Answer 2 · 2016-03-12T10:45:16.620

There are (at least) two issues here, firstly the time taken to allocate the memory and secondly the locality of memory in the arrays and the impact of this on performance. I don't know much about the actual allocation process, although the links suggested by High Performance Mark and the answer by Vadimir F cover this.

From your question, it seems you are more interested in cache hits and memory locality given by arrays being next to each other. I would guess there is no guarantee either allocate statement ensures both arrays next to each other in memory. This is based on allocating arrays in a type, which in the fortran 2003 MAY 2004 WORKING DRAFT J3/04-007 standard

NOTE 4.20 Unless the structure includes a SEQUENCE statement, the use of this terminology in no way implies that these components are stored in this, or any other, order. Nor is there any requirement that contiguous storage be used.

From the discussion with Vadimir F, if you put allocatable arrays in a type and use the sequence keyword, e.g.

type botharrays
    SEQUENCE
    double precision, dimension(:), allocatable :: x, y
end type

this DOES NOT ensure they are allocated as adjacent in memory. For static arrays or lots of variables, a sequential type sounds like it may work like your idea of "allocating a vector for all space and using pointers to address the space allocated as multiple variables". I think common blocks (Fortran 77) allowed you to specify the relationship between memory location of arrays and variables in memory, but don't work with allocatable arrays either.

In short, I think this means you cannot ensure two allocated arrays are adjacent in memory. Even if you could, I don't see how this will result in a reduction in cache misses or improved performance. Even if you typically use the two together, unless the arrays are small enough that the cache will include multiple arrays in one read (assuming reads are allowed to go beyond array bounds) you won't benefit from the memory locality.

I am quite sure the `SEQUENCE` either does not concern allocatable and pointer components or is not allowed for such a structure at all. Anyway it is a receipe for problems as it disables padding inside the structure which may be very helpful for performance. — Vladimir F Героям слава, Mar 11 '16 at 18:34
It is allowed, but I am 100% sure it does not affect allocatable and pointer components (just checked the resulting code, no time to search in the standard). — Vladimir F Героям слава, Mar 11 '16 at 18:40
Ah, I didn't know `sequence` doesn't work for allocatable arrays, I can't see anything in the standard to say either way, although it does say "The type is a `numeric sequence type` if there are no type parameters, no pointer or allocatable components". I guess this implies type variable ordering is only enforced without allocatable components. — Ed Smith, Mar 11 '16 at 19:49

Does the order or syntax of allocate statement affect performance? (Fortran)

2 Answers2