25

I'm trying to figure out how much memory I can allocate before the allocation will fail.

This simple C++ code allocates a buffer (of size 1024 bytes), assigns to the last five characters of the buffer, reports, and then deletes the buffer. It then doubles the size of the buffer and repeats until it fails.

Unless I'm missing something, the code is able to allocate up to 65 terabytes of memory before it fails on my MacBook Pro. Is this even possible? How can it allocate so much more memory than I have on the machine? I must be missing something simple.

int main(int argc, char *argv[])
{
        long long size=1024;
        long cnt=0;
        while (true)
        {
                char *buffer = new char[size];
                // Assume the alloc succeeded. We are looking for the failure after all.

                // Try to write to the allocated memory, may fail
                buffer[size-5] = 'T';
                buffer[size-4] = 'e';
                buffer[size-3] = 's';
                buffer[size-2] = 't';
                buffer[size-1] = '\0';

                // report
                if (cnt<10)
                        cout << "size[" << cnt << "]: " << (size/1024.) << "Kb ";
                else if (cnt<20)
                        cout << "size[" << cnt << "]: " << (size/1024./1024.) << "Mb ";
                else
                        cout << "size[" << cnt << "]: " << (size/1024./1024./1024.) << "Gi ";
                cout << "addr: 0x" << (long)buffer << " ";
                cout << "str: " << &buffer[size-5] << "\n";

                // cleanup
                delete [] buffer;

                // double size and continue
                size *= 2;
                cnt++;
        }
        return 0;
}
Makyen
  • 31,849
  • 12
  • 86
  • 121
Thomas Jay Rush
  • 401
  • 4
  • 12
  • 1
    is allocating of variables on memory "Random Access Memory" or on "Hard Disk"? – Raindrop7 Dec 30 '16 at 21:39
  • 3
    BTW: If you are really trying to determine the actual maximum you can allocate, you should have an additional loop used after the first failure where you reduce the amount added to your allocation request by a factor of two each time until you are increasing by the minimum you desire to check. For example: request for 512GiB is OK, request for 1024GiB fails, then request (512GiB+256GiB)=768GiB (if OK then (768GiB+128GiB)=896GiB, if fail then try (512GiB+128GiB)=640GiB), etc. – Makyen Dec 31 '16 at 01:11
  • 4
    macbook pro or not is irrelevant here, you don't need to put it in the title. [Allocating more memory than there exists using malloc](http://stackoverflow.com/q/19750796/995714), [maximum memory which malloc can allocate](http://stackoverflow.com/q/2798330/995714) – phuclv Dec 31 '16 at 07:41
  • You might believe the myth that memory is RAM. Memory is not RAM. Memory is *an abstraction of the ability to store data and retrieve it*, and that abstraction can be implemented with lots of different kinds of hardware. RAM is just the *fast and convenient* solution. You would do better to think of memory as disk space, and RAM as a cache that makes it faster to access the disk, because these days, that's what it is. – Eric Lippert Dec 31 '16 at 19:22

3 Answers3

44

When you ask for memory, an operating system reserves the right not to actually give you that memory until you actually use it.

That's what's happening here: you're only ever using 5 bytes. My ZX81 from the 1980s could handle that.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • I do use it. I write data to the last five characters of the buffer and then print it. – Thomas Jay Rush Dec 30 '16 at 17:41
  • 27
    So what, the operating system is cleverer than you think. You don't think that a macbook pro is simply good looking? – Bathsheba Dec 30 '16 at 17:41
  • 3
    Oh. So if I wrote to every character in each buffer it would fail much earlier? Is that what you're saying? – Thomas Jay Rush Dec 30 '16 at 17:42
  • 7
    Yup. A well crafted OS will give you memory back in blocks; often called *pages*. – Bathsheba Dec 30 '16 at 17:42
  • Got it. Thanks. I knew I must have been missing something. – Thomas Jay Rush Dec 30 '16 at 17:42
  • 5
    google://memory+overcommitment – n. m. could be an AI Dec 30 '16 at 17:43
  • @ThomasJayRush : using it means writing to it everywhere, or at least to every page (usually 4 or 8kB). Otherwise the OS simply doesn't provide the piece of memory, or the part of it. – Zbynek Vyskovsky - kvr000 Dec 30 '16 at 17:44
  • I changed the code to write to every 512th byte memory location in each buffer. It slows down significantly, but still allocates at least 32 Gig. The computer only has 4Gig memory, but I suppose it's using swap space. I got tired of waiting for it to fail, and 32 gig is way more than I need, so thanks for the quick answers. I select your answer as soon as I can. – Thomas Jay Rush Dec 30 '16 at 17:50
  • 1
    Your ZX81 didn't have virtual memory, did it? That's the key to allocating much more than physical+swap space, but only needing one page of physical RAM for the single page of it you dirtied. The other pages stay mapped read-only to the physical zero-page. – Peter Cordes Dec 31 '16 at 11:34
  • @Bathsheba a virtual memory system that *does* overcommit is just a run-time crash waiting to happen at a random time. Putting it in an overpriced pretty-looking case doesn't solve that problem! – alephzero Dec 31 '16 at 16:22
  • 1
    But the operating system does immediately give you address space, which your ZX81 could not have done. I think your answer could be improved by a brief discussion of address space vs. pages of RAM vs. swap. – derobert Dec 31 '16 at 20:12
36

MacOS X, like almost every modern operating system, uses "delayed allocation" for memory. When you call new, the OS doesn't actually allocate any memory. It simply makes a note that your program wants a certain amount of memory, and that memory area you want starts at a certain address. Memory is only actually allocated when your program tries to use it.

Further, memory is allocated in units called "pages". I believe MacOS X uses 4kb pages, so when your program writes to the end of the buffer, the OS gives you 4096 bytes there, while retaining the rest of the buffer as simply a "your program wants this memory" note.

As for why you're hitting the limit at 64 terabytes, it's because current x86-64 processors use 48-bit addressing. This gives 256 TB of address space, which is split evenly between the operating system and your program. Doubling the 64 TB allocation would exactly fit in your program's 128 TB half of the address space, except that the program is already taking up a little bit of it.

Mark
  • 2,792
  • 2
  • 18
  • 31
  • 2
    What do they use the extra 16 bits of an address for? – tbodt Dec 30 '16 at 23:27
  • 8
    @tbodt - They don't. Basically, it's *theoretically* a 64 bit address space, but for layout reasons the upper 16 bits are unwired, because the chip can't physically support that much memory anyways. That makes the silicon design easier because you don't have these huge bus portions that go everywhere, but don't actually do anything. – Fake Name Dec 30 '16 at 23:35
  • 8
    Note that Windows also does delayed allocation, *but it doesn't do overcommit*, so on Windows you shouldn't be able to allocate more than your total RAM + swap space - memory already in use. – user253751 Dec 31 '16 at 07:33
  • I don't think any OSes use hugepages (2MiB or 1GiB for x86-64) by default (except for stuff like Linux's transparent hugepages), so yes, normal 4kiB x86 pages are a safe guess. I posted an answer with more details about how the copy-on-write lazy-allocation stuff works. – Peter Cordes Dec 31 '16 at 12:01
7

Virtual memory is the key to allocating more address space than you have physical RAM+swap space.

malloc uses the mmap(MAP_ANONYMOUS) system call to get pages from the OS. (Assuming OS X works like Linux, since they're both POSIX OSes). These pages are all copy-on-write mapped to a single physical zero page. i.e. they all read as zero with only a TLB miss (no page fault and no allocation of physical RAM). An page is 4kiB. (I'm not mentioning hugepages because they're not relevant here).

Writing to any of those pages triggers a soft page fault for the kernel to handle the copy-on-write. The kernel allocates a zeroed page of physical memory and re-wires that virtual page to be backed by the physical page. On return from the page fault, the store is re-executed and succeeds this time.

So after allocating 64TiB and storing 5 bytes to the end of it, you've used one extra page of physical memory. (And added an entry to malloc's bookkeeping data, but that was probably already allocated and in a dirty page. In a similar question about multiple tiny allocations, malloc's bookkeeping data was what eventually used up all the space).

If you actually dirtied more pages than the system had RAM + swap, the kernel would have a problem because it's too late for malloc to return NULL. This is called "overcommit", and some OSes enable it by default while others don't. In Linux, it's configurable.


As Mark explains, you run out of steam at 64TiB because current x86-64 implementations only support 48-bit virtual addresses. The upper 16 bits need to be copies of bit 47. (i.e. an address is only canonical if the 64-bit value is the sign-extension of the low 48 bits).

This requirement stops programs from doing anything "clever" with the high bits, and then breaking on future hardware that does support even larger virtual address spaces.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847