This question/answer indicates that some implementations of realloc
on modern virtual memory OS's manipulate the page table instead of actually copying data.
This is obviously desirable for programs that work with large arrays. If your machine has 16GB memory and you want to increase an 8GB array to 10GB, a page-table-based realloc
would work quickly, while a copying realloc
would need to page out to the disk.
I compiled and ran the following program on Windows 7 (MSVC 2013) and OS X 10.8 Mountain Lion (Clang++ 2.79). Both machines had 16 GB of memory. Compilers were optimizing at O2
. I added the writes to prevent optimization and overcommit tricks.
#include <stdlib.h>
size_t const GIG = 1000 * 1000 * 1000;
size_t const BIG = 4 * GIG;
size_t const HUGE = 6 * GIG;
int main()
{
char *mem = (char *)malloc(BIG);
for (size_t i = 0; i < BIG; ++i) {
mem[i] = i & 0xFF;
}
mem = (char *)realloc(mem, HUGE);
for (size_t i = BIG; i < HUGE; ++i) {
mem[i] = i & 0xFF;
}
free(mem);
}
On Windows, the process's Commit value in Resource Monitor peaks at 10GB. On OS X, the process's Virtual Memory value in Activity Monitor doesn't exceed 6GB.
If I change BIG
to 8GB and HUGE
to 10GB, so their combined total is larger than physical memory, the Windows version brings the system to its knees with disk paging while the OSX version does not.
These tests indicate that Windows's realloc
copies data and OS X's realloc
manipulates the page table.
I don't have a Linux machine, but this blog post travels through glibc and Linux source code to prove that Linux's realloc
manipulates the page table for large arrays.
This blog post notes Windows's slow realloc
, but the author gets around it by changing the program instead of using a different allocator.
There must be some way to get the desirable page table behavior on Windows. Is there a different memory allocation API that provides it?
Edit: would this work? For each huge array, reserve a virtual address range the size of the entire system memory using VirtualAlloc
, and then commit physical memory as needed. For programs that deal with more than one huge array, this would only work when the address space is much bigger than the physical memory. But that seems like a safe assumption for 64-bit desktop systems in the near future. Even the 48-bit address bus on current x64 CPUs can address 281475 GB of memory.