18

While testing if any() short-circuits (it does!) I found out the following interesting behavior when preallocating the test variable:

test=zeros(1e7,1);
>> tic;any(test);toc
Elapsed time is 2.444690 seconds.
>> test(2)=1;
>> tic;any(test);toc
Elapsed time is 0.000034 seconds.

However if I do:

test=ones(1e7,1);
test(1:end)=0;
tic;any(test);toc
Elapsed time is 0.642413 seconds.
>> test(2)=1;
>> tic;any(test);toc
Elapsed time is 0.000021 seconds.

Turns out that this happens because the variable is not really on RAM until its completely filled with information, therefore the first test takes longer because it needs to allocate it. The way I checked this was by looking at the memory used in the Windows Task Manager.

While this may make some sense (do not initialize until its needed), what confused me a bit more is the following test, where the variable is filled in a for loop and at some point the execution is stopped.

test=zeros(1e7,1);

for ii=1:1e7
    test(ii)=1;
    if ii==1e7/2
        pause
    end
end

When checking the memory used by MATLAB, I could see how when stopped, it was using only 50% of test needed memory (if it was full). This can be reproduced with different % of memory quite solidly.

Interestingly the following does not allocate the entire matrix either.

test=zeros(1e7,1);
test(end)=1;

I know that MATLAB is not dynamically allocating and increasing the size of test in the loop, as that would make the end iterations very slow (due to the high memcopys that would need) and it would also allocate the entire array in this last test I proposed. So my question is:

What is going on?

Someone suggested that this can be related to virtual-memory vs physical-memory, and related to how the OS sees memory. Not sure how that links to the first test proposed here though. Any further explanation would be ideal.

Win 10 x64, MATLAB 2017a

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
Ander Biguri
  • 35,140
  • 11
  • 74
  • 120
  • 3
    Related: https://stackoverflow.com/q/19991623/7328782 – Cris Luengo Aug 23 '18 at 15:11
  • 1
    The linked duplicate has a very detailed explanation of the low lever "magic" that happens. That explains everything it can be seen in this post. – Ander Biguri Aug 23 '18 at 15:22
  • @rahnema1 ultimately that is the level of detail you need to get to to understand, but its not a book, its another SO answer. I will consider wrapping up a short answer describing why this happens with links to that one if I find a bit of time. I edited the code, as it was wrongly edited at some point (by me) – Ander Biguri Aug 23 '18 at 16:55

1 Answers1

17

This behavior is not unique to MATLAB. In fact, MATLAB has no control over it, as it is Windows that causes it. Linux and MacOS show the same behavior.

I had noticed this exact same thing in a C program many years ago. It turns out that this is well documented behavior. This excellent answer explains in gory details how memory management works in most modern OSes (thanks Amro for sharing the link!). Read it if this answer doesn't have enough detail for you.

First, let's repeat Ander's experiment in C:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int main (void) {

   const int size = 1e8;

   /* For Linux: */
   // const char* ps_command = "ps --no-headers --format \"rss vsz\" -C so";
   /* For MacOS: */
   char ps_command[128];
   sprintf(ps_command, "ps -o rss,vsz -p %d", getpid());

   puts("At program start:");
   system(ps_command);

   /* Allocate large chunck of memory */

   char* mem = malloc(size);

   puts("After malloc:");
   system(ps_command);

   for(int ii = 0; ii < size/2; ++ii) {
      mem[ii] = 0;
   }

   puts("After writing to half the array:");
   system(ps_command);

   for(int ii = size/2; ii < size; ++ii) {
      mem[ii] = 0;
   }

   puts("After writing to the whole array:");
   system(ps_command);

   char* mem2 = calloc(size, 1);

   puts("After calloc:");
   system(ps_command);

   free(mem);
   free(mem2);
}

The code above works on a POSIX-compliant OS (i.e. any OS except Windows), but on Windows you can use Cygwin to become (mostly) POSIX-compliant. You might need to change the ps command syntax depending on your OS. Compile with gcc so.c -o so, run with ./so. I see the following output on MacOS:

At program start:
   RSS      VSZ
   800  4267728
After malloc:
   RSS      VSZ
   816  4366416
After writing to half the array:
   RSS      VSZ
 49648  4366416
After writing to the whole array:
   RSS      VSZ
 98476  4366416
After calloc:
   RSS      VSZ
 98476  4464076

There are two columns displayed, RSS and VSZ. RSS stands for "Resident set size", it is the amount of physical memory (RAM) that the program is using. VSZ stands for "Virtual size", it is the size of the virtual memory assigned to the program. Both quantities are in KiB.

The VSZ column shows 4 GiB at program start. I'm not sure what that is about, it seems over the top. But the value grows after malloc and again after calloc, both times with approximately 98,000 KiB (slightly over the 1e8 bytes we allocated).

In contrast, the RSS column shows an increase of only 16 KiB after we allocated 1e8 bytes. After writing to half the array, we have a bit over 5e7 bytes of memory in use, and after writing to the full array we have a bit over 1e8 bytes in use. Thus, the memory gets assigned as we use it, not when we first ask for it. Next, we allocate another 1e8 bytes using calloc, and see no change in the RSS. Note that calloc returns a memory block that is initialized to 0, exactly like MATLAB's zeros does.

I am talking about calloc because it is likely that MATLAB's zeros is implemented through calloc.

Explanation:

Modern computer architectures separate virtual memory (the memory space that a process sees) from physical memory. The process (i.e. a program) uses pointers to access memory, these pointers are addresses in virtual memory. These addresses are translated by the system into physical addresses when used. This has many advantages, for example it is impossible for one process to address memory assigned to another process, since none of the addresses it can generate will ever be translated to physical memory not assigned to that process. It also allows the OS to swap out memory of an idling process to let another process use that physical memory. Note that the physical memory for a contiguous block of virtual memory doesn't need to be contiguous!

The key is the bolded italic text above: when used. Memory assigned to a process might not actually exist until the process tries to read from or write to it. This is why we don't see any change in RSS when allocating a large array. Memory used is assigned to physical memory in pages (blocks typically of 4 KiB, sometimes up to 1 MiB). So when we write to one byte of our new memory block, only one page gets assigned.

Some OSes, like Linux, will even "overcommit" memory. Linux will assign more virtual memory to processes than it has the capacity to put into physical memory, under the assumption that those processes will not use all the memory they are assigned anyway. This answer will tell you more over overcommitting than you will want to know.

So what happens with calloc, which returns zero-initialized memory? This is also explained in the answer I linked earlier. For small arrays malloc and calloc return a block of memory from a larger pool obtained from the OS at the start of the program. In this case, calloc will write zeros to all bytes to make sure it is zero-initialized. But for larger arrays, a new block of memory is directly obtained from the OS. The OS always gives out memory that is zeroed out (again, it prevents one program to see data from another program). But because the memory doesn't get physically assigned until used, the zeroing out is also delayed until a memory page is put into physical memory.

Back to MATLAB:

The experiment above shows that it is possible to obtain a zeroed-out block of memory in constant time and without changing the physical size of a program's memory. This is how MATLAB's function zeros allocates memory without you seeing any change in MATLAB's memory footprint.

The experiment also shows that zeros allocates the full array (likely through calloc), and that memory footprint only increases as this array is used, one page at a time.

The preallocation advice by the MathWorks states that

you can improve code execution time by preallocating the maximum amount of space required for the array.

If we allocate a small array, then want to increase its size, a new array has to be allocated and data copied over. How the array is associated to RAM has no influence on this, MATLAB only sees virtual memory, it has no control (or even knowledge?) of where in the physical memory (RAM) these data are stored. All that matters for an array from MATLAB's point of view (or that of any other program) is that the array is a contiguous block of virtual memory. Enlarging an existing block of memory is not always (usually not?) possible, and so a new block is obtained and data copied over. For example, see the graph in this other answer: when the array is enlarged (this happens at the large vertical spikes) data is copied; the larger the array, the more data needs to be copied.

Preallocating avoids enlarging the array, as we make it large enough to begin with. In fact, it is more efficient to make an array that is way too large for what we need, as the portion of the array that we don't use is actually never really given to the program. That is, if we allocate a very large block of virtual memory, and only use the first 1000 elements, we'll only really use a few pages of physical memory.

The behavior of calloc described above explains also this other strange behavior of the zeros function: For small arrays, zeros is more expensive than for large arrays, because small arrays need to be zeroed explicitly by the program, whereas large arrays are implicitly zeroed by the OS.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
  • please read last example in my post, and see if its compatible with your post or not, in the last example i showed that if you use a simple condition to break the loop it could understand and allocate half the required memory for entire matrix, but if you use a complicated condition to stop the loop it can not understand and allocate memory for full matrix, despite we are just using 10 member for the array, – Hadi Aug 24 '18 at 18:02
  • 1
    @Hadi: I don't think MATLAB does anything like that. What I describe here is all under the control of the Operating System. It is the OS that assigns pages of RAM to MATLAB as MATLAB tries to use them. MATLAB doesn't need to be clever with memory usage, it can just allocate the full array and use it like it is there in RAM. The OS will take care of putting portions of it in RAM as they are being used. – Cris Luengo Aug 24 '18 at 18:13
  • then why when we use ii=1e7/2 it uses onle half the memory for the array and when we use " if test(ii-1)==2 " it allocate memory for the full array ? – Hadi Aug 24 '18 at 18:21
  • 2
    @Hadi: MATLAB allocates memory for the full array, but the OS doesn't assign any RAM to MATLAB until something is written to it. This is the difference between Virtual Memory and Physical Memory. As you can see in the experiment I did, the virtual memory size increased when I called `malloc`, but the physical memory size didn't increase until I wrote data to that array. – Cris Luengo Aug 24 '18 at 18:26
  • Great answer! Thanks also for the pointer to the C answer. This poses one question, though: since physical memory is not actually assigned in the beginning, what value is the usual [preallocate](https://www.mathworks.com/help/matlab/matlab_prog/preallocating-arrays.html) advice? – Luis Mendo Aug 24 '18 at 19:42
  • 1
    @LuisMendo: I've edited the last portion of the answer, is that more clear now? – Cris Luengo Aug 24 '18 at 20:38
  • _Preallocating avoids enlarging the array_ from the point of view of virtual memory, but not from the point of view of physical memory, which in fact is dynamically assigned by the OS throughout the iterations. So, if pre-allocating in MATLAB is any good, it must be that re-allocating a contiguous block of virtual memory and moving the data there (which is what pre-allocation avoids) is much slower than dynamically assigning pages of physical memory (which happens even if pre-allocation has been done, it seems). Is my intepretation correct? – Luis Mendo Aug 24 '18 at 23:14
  • 1
    @Luis: That seems to be the case. Note that physical memory pages do not need to be contiguous, each page can be anywhere in RAM (or on the hard drive if swapped out). The hardware takes care of translating virtual memory pointers to physical memory locations. So if you allocate an array, and write to the first element, you'll get one page of RAM assigned. As you write more, a second page of RAM will be assigned, but this second page does not need to be right next to the first one. This makes it easy for the OS to assign these pages, stuff doesn't need to be moved around to make space. – Cris Luengo Aug 24 '18 at 23:21
  • @CrisLuengo Thanks. It makes sense to me now – Luis Mendo Aug 24 '18 at 23:22
  • @Luis: Why it is then so hard to move things around in virtual memory I don't really know... You'd think you could say "move this bit of memory over to this other address", and the OS would just re-assign the physical page to that new bit of memory. Not sure. :) – Cris Luengo Aug 24 '18 at 23:22
  • If one is working with a large amount of data in Matlab and wishes for the program to load up the data in physical memory only such that one doesn't have to spend time swapping out blocks of memory in the swap space on disk, it sounds like there is no way to make sure this happens, is that correct or is there a way (other than limiting the amount of memory one asks MATLAB to handle at a time)? – Tom Mozdzen Oct 04 '21 at 22:45
  • 1
    @TomMozdzen I think you are talking about memory-mapped files. You can do this in MATLAB too, see [the docs](https://www.mathworks.com/help/matlab/import_export/overview-of-memory-mapping.html). Also consider [tall arrays](https://www.mathworks.com/help/matlab/tall-arrays.html). – Cris Luengo Oct 04 '21 at 22:51
  • @CrisLuengo - thanks Cris - yes and no. I only read the data in once (like 4 GB worth) and crunch it with code. I never need to access that data again. My output is a small histogram (800 bins). I have other large arrays that are formed when crunching that data, and I have 64 GB of physical DRAM. My variables use double to make my intel CPU and matlab happy. I'd read in 8 GB at a time but then my variables grow to about ~40 GB and swap space appears. so I try to keep total memory usage in Matlab to about a quarter of the physical memory to help avoid offloading anything back to disk. – Tom Mozdzen Oct 05 '21 at 02:32
  • @CrisLuengo, I will have to give the memory mapping a try to see if the initial read gets sped up. I'm currently using; fileID = fopen(filename,'r','l'); size=2^30; databytes=fread(fileID,size,'*ubit32'); for the one and only read. But I loop through the read statement when the data file is larger than 4*2^30 Bytes. Some of my files are 32* 2^30 Bytes. – Tom Mozdzen Oct 05 '21 at 15:18
  • 1
    @TomMozdzen A memory-mapped file is not going to be more efficient than what you're doing if you only read the data once. It's just a way to have either very large data only partially in memory, or to efficiently read a small file repeatedly. You can use `fread` with your chunk size to read a chunk of data in one go. `fseek` will move the location where the next read will start, in case you want to read chunks out of order. I recommend that you post a question describing your situation, maybe someone has experience with a similar problem and can help you find an efficient solution. – Cris Luengo Oct 05 '21 at 16:19