1

In a MPI PIC code I am writing, the array size I actually need in storing particles in a processor fluctuates with time, with size changing between [0.5n : 1.5n], where n is an average size. Presently, I allocate arrays of the largest size, i.e, 1.5*n, in this case, for once in each processor and use them without changing thier size afterward. I am considering an alternative way: i.e., re-allocating all the arrays each time step with their correct sizes, so that I can save memory. But I worry whether re-allocating arrays is expensive and this overhead will slow the code substantially.

Can this issue be verified only by actually profiling the code, or, there is a simple principle inicating that the allocation operating is cheap enough so that we do not need worry about its overhead?

Someone said: "ALLOCATE does not imply physical memory allocation. For example, you can ALLOCATE an array up to the size of your virtual memory limit, then use it as a sparse array, using physical memory pages only as the space is addressed."

Is this true in Fortran?

Youjun Hu
  • 991
  • 6
  • 18
  • 1
    Here is a quick benchmark result for the sake of this question (processor Intel 12th Gen i9, Dell CAMM 64Gb, .5/15/30Mb L1/L2/L3 cache): task: allocate() + move_alloc(). Average runtime: Array(1:10) --> 100nanosec; Array(1:10**6) --> 10millisec; So your concerns may be relevant only if allocations are large and happen on the order of millions/billions of times, depending on the hardware. I recommend implementing it correctly and then looking for the bottlenecks if you are not satisfied with the performance. – Scientist Feb 12 '23 at 05:44
  • I don't have anything to add to the already posted answers, just a remar about the allocation strategy: if you know in advance what is the maximum size you will need, it's perfectly reasonnable to allocate only once with this size. – PierU Feb 12 '23 at 12:03
  • for a multi-diemsion array, if we allocate with the maximum size along each dimension, then it will be hard to use that array because of the shape mismatch. – Youjun Hu Feb 12 '23 at 12:14
  • 1
    @YoujunHu This may require a bit of additional code to manage that, but it's not a big deal. – PierU Feb 12 '23 at 14:19
  • 1
    Re your question "Can this issue be verified only by actually profiling the code, or, there is a simple principle inicating that the allocation operating is cheap enough so that we do not need worry about its overhead?", it would be impossible to have a general answer, because it depends on how frequently you're allocating and how much work you're doing in-between. If allocation takes 1 ms, you're doing 1 ms of work, and running a million steps, then you need to worry about allocation overhead. Same 1 ms allocation, 2 s of work, 1000 steps, not so much. – Craig Feb 16 '23 at 18:08

2 Answers2

3

There is no single correct answer to this question. And a complete answer would need to explain how a typical Fortran memory allocator works, AND how typical virtual memory systems work. (That is too broad for a StackOverflow Q&A.)

But here are a couple of salient points.

  • When you reallocate an array you have the overhead of copying the data in the old array to the new array.

  • Reallocating an array doesn't necessarily reduce your processes actual memory usage. Memory is requested from the OS in large regions (memory segments) and the Fortran allocator then manages the memory it has been given and responds to the application's allocate and deallocate requests. When an array is deallocated, the memory can't be handed back to the OS because there will most likely be other allocated arrays in the same region.

  • In fact, repeated allocation and deallocation of variable sized arrays can lead to fragmentation ... which further increases memory usage.

What does this mean for you?

That's not clear. It will depend on exactly what your application's memory usage patterns are. And it will depend on how your Fortran runtime's memory allocator works.

But my gut feeling is that you are probably better off NOT trying to dynamically resize arrays to (just) save memory.

Someone said: "ALLOCATE does not imply physical memory allocation. For example, you can ALLOCATE an array up to the size of your virtual memory limit, then use it as a sparse array, using physical memory pages only as the space is addressed."

That is true, but it is not the complete picture.

You also need to consider what happens when an application's virtual memory usage exceeds the physical memory pages available. In that scenario, when the application tries to access a virtual memory page that is not in physical memory the OS virtual memory system needs to "page" another VM page out of physical RAM and "page" in the VM page that the application wants. This will entail writing the existing page (if it is dirty) to the paging device and then reading in the new one. This is going to take a significant length of time, and it will impact on application performance.

If the ratio of available physical RAM to the application's VM working set is too out of balance, the entire system can go into "virtual memory thrashing" ... which can lead to the machine becoming non-responsive and even crashing.

In short if you don't have enough physical RAM, using virtual memory to implement huge sparse arrays can be disaster prone.

It is worth noting that the compute nodes on a large-scale HPC cluster will often be configured with ZERO backing storage for VM swapping. If an application then attempts to use more RAM than is present on the compute node it will error out. Immediately.

Is this true in Fortran?

Yes. Fortran doesn't have any special magic ...

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Fixed. I haven't actually used Fortran since FORTRAN 4. – Stephen C Feb 12 '23 at 07:49
  • The values in the old arrays are not needed, so that I do not have the overhead of copying the data in the old array to the new array. – Youjun Hu Feb 12 '23 at 12:02
  • 1) That isn't clear from your question. 2) There may also be the hidden overhead of allocator zeroing the array. (That is apparently implementation dependent.) – Stephen C Feb 12 '23 at 13:01
  • 1
    @StephenC In Fortran the `allocate` routine by default doesn't initialize the content of the allocated array. – PierU Feb 12 '23 at 14:15
  • I have read that it is implementation dependent whether it does or doesn't. – Stephen C Feb 12 '23 at 14:44
  • That is why I carefully wrote "there **may** be a hidden overhead ...". – Stephen C Feb 12 '23 at 14:56
  • @StephenC Well, indeed the standard in itself doesn't forbid initializing the array, and a compiler could do that without violating the standard. But nobody would really want to use such a compiler. – PierU Feb 12 '23 at 15:01
3

Fortran is no different than say,C , because Fortran allocate typically does not call any low-level system functions but tends to be implemented using malloc() under the hood.

"Is this true in Fortran?"

The lazy allocation you describe is highly system dependent. It is indeed valid on modern Linux. However, it does not mean that it is a good idea to just allocate several 1 TB arrays and than just using certain sections of them. Even if it works in practice on one computer it may very much fail on a different one or on a different operating system or CPU family.

Re-allocation takes time, but it is the way to go to keep your programs standard conforming and undefined-behaviour free. Reallocating every time step may easily bee too slow. But in your previous answer we have showed you that for continuously growing arrays you typically allocate in a geometric series, e.g. by doubling the size. That means that it will only be re-allocated logarithmically often if it grows linearly.

There may be a concern of exceeding the system memory when allocating to the new size and having two copies at the same size. This is only a concern when your consumption high anyway. C has realloc() (which may not help anyway) but Fortran has nothing similar.

Regarding the title question, not every malloc takes the same time. There are is internal bookkeeping involved and the implementations do differ. Some points are raised at https://softwareengineering.stackexchange.com/questions/319015/how-efficient-is-malloc-and-how-do-implementations-differ and also to some extent at Minimizing the amount of malloc() calls improves performance?