4

Important: Scroll down to the "final update" before you invest too much time here. Turns out the main lesson is to beware of the side effects of other tests in your unittest suite, and to always reproduce things in isolation before jumping to conclusions!


On the face of it, the following 64-bit code allocates (and accesses) one-mega 4k pages using VirtualAlloc (a total of 4GByte):

const size_t N=4;  // Tests with this many Gigabytes
const size_t pagesize4k=4096;
const size_t npages=(N<<30)/pagesize4k;

BOOST_AUTO_TEST_CASE(test_VirtualAlloc) {

  std::vector<void*> pages(npages,0);
  for (size_t i=0;i<pages.size();++i) {
    pages[i]=VirtualAlloc(0,pagesize4k,MEM_RESERVE|MEM_COMMIT,PAGE_READWRITE);
    *reinterpret_cast<char*>(pages[i])=1;
  }

  // Check all allocs succeeded
  BOOST_CHECK(std::find(pages.begin(),pages.end(),nullptr)==pages.end()); 

  // Free what we allocated
  bool trouble=false;
  for (size_t i=0;i<pages.size();++i) {
    const BOOL err=VirtualFree(pages[i],0,MEM_RELEASE);
    if (err==0) trouble=true;
  }
  BOOST_CHECK(!trouble);
}

However, while executing it grows the "Working Set" reported in Windows Task Manager (and confirmed by the value "sticking" in the "Peak Working Set" column) from a baseline ~200,000K (~200MByte) to over 6,000,000 or 7,000,000K (tested on 64bit Windows7, and also on ESX-virtualized 64bit Server 2003 and Server 2008; unfortunately I didn't take note of which systems the various numbers observed occurred on).

Another very similar test case in the same unittest executable tests one-mega 4k mallocs (followed by frees) and that only expands by around the expected 4GByte when running.

I don't get it: does VirtualAlloc have some quite high per-alloc overhead? It's clearly a significant fraction of the page size if so; why is so much extra needed and what's it for? Or am I misunderstanding what the "Working Set" reported actually means? What's going on here?

Update: With reference to Hans' answer, I note this fails with an access violation in the second page access, so whatever is going on isn't as simple as the allocation being rounded up to the 64K "granularity".

char*const ptr = reinterpret_cast<char*>(
  VirtualAlloc(0, 4096, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE)
);
ptr[0] = 1;
ptr[4096] = 1;

Update: Now on an AWS/EC2 Windows2008 R2 instance, with VisualStudioExpress2013 installed, I can't reproduce the problem with this minimal code (compiled 64bit), which tops out with an apparently overhead-free peak working set of 4,335,816K, which is the sort of number I'd expected to see originally. So either there is something different about the other machines I'm running on, or the boost-test based exe used in the previous testing. Bizzaro, to be continued...

#define WIN32_LEAN_AND_MEAN
#include <Windows.h>

#include <vector>

int main(int, char**) {

    const size_t N = 4;
    const size_t pagesize4k = 4096;
    const size_t npages = (N << 30) / pagesize4k;

    std::vector<void*> pages(npages, 0);
    for (size_t i = 0; i < pages.size(); ++i) {
        pages[i] = VirtualAlloc(0, pagesize4k, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
        *reinterpret_cast<char*>(pages[i]) = 1;
    }

    Sleep(5000);

    for (size_t i = 0; i < pages.size(); ++i) {
        VirtualFree(pages[i], 0, MEM_RELEASE);
    }

    return 0;
}

Final update: Apologies! I'd delete this question if I could because it turns out the observed problems were entirely due to an immediately preceeding unittest in the test suite which used TBB's "scalable allocator" to allocate/deallocate a couple of GByte of stuff. It seems scalable allocator actually retains such allocations in it's own pool rather than returning them to the system (see e.g here or here). Became obvious once I ran tests individually with enough of a Sleep after them to observe their on-completion working set in task manager (whether anything can be done about the TBB behaviour might be an interesting question, but as-is the question here is a red-herring).

timday
  • 24,582
  • 12
  • 83
  • 135
  • 1
    `malloc` uses `HeapAlloc`, delegating memory management to the heap manager. The heap manager is implemented using `VirtualAlloc`, but keeps track of unused memory, so that it won't go to waste. See also [Is VirtualAlloc alignment consistent with size of allocation?](http://stackoverflow.com/q/20023446/1889329) for further information on `VirtualAlloc`. – IInspectable Jan 03 '14 at 22:14

2 Answers2

5
   pages[i]=VirtualAlloc(0,pagesize4k,MEM_RESERVE|MEM_COMMIT,PAGE_READWRITE);

You won't get 4096 bytes, it will be rounded up to the smallest permitted allocation. Which is SYSTEM_INFO.dwAllocationGranularity, it has been 64KB for a long time. It is a very basic address space fragmentation counter-measure.

So you are allocating way more than you think.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • 1
    Will VirtualAlloc actualy reserve and commit on 64KB granularity boundary, or reserve on 64KB, but commit on page (4KB) boundaries and in page-size chunks? – marcinj Jan 03 '14 at 20:24
  • Hmmm this is interesting information; my interpretation was more marcin_i's one. I'm a bit surprised I don't just crash and burn or see the working set blow up to 64GByte if that's the case though... but maybe the fact I only touch the first page in in each alloc saves me. Easily tested... I ought to also be able to also hit *(reinterpret_cast(pages[i])+4096)=1 on each allocation without crashing then? Unfortunately I won't have access to a Windows machine again until next week sometime... – timday Jan 03 '14 at 21:29
  • Yes, mapping VM to RAM is still page based with 4096 byte granularity. Just compare the commit size to the working set to see the difference. – Hans Passant Jan 03 '14 at 22:35
  • (OK I actually fired up an Windows EC2 instance to play with this some more). What state are the other 60K's worth of pages in the "granularity" address range in then? They're clearly not also committed as accessing the +4096byte offset produces an access violation. Are they reserved but not committed? – timday Jan 04 '14 at 00:12
  • Trying this (never got this wrong intentionally), they look just unavailable. Can't get them committed nor re-allocated. – Hans Passant Jan 04 '14 at 00:51
  • Aha, yes VirtualQuery looks like a nice way of probing what's actually going on; will give it a try (once I can reproduce issue; don't see it with a minimal test on an EC2 machine; see latest update above). – timday Jan 04 '14 at 00:52
  • Actually, this turns out to be a false trail. All the problems were related to an immediately preceeding unittest in the test suite which used TBB's "scalable allocator", which isn't good about recycling stuff to the process' heap or OS. See final update on question. – timday Jan 04 '14 at 14:26
0

It turns out the observed problems were entirely due to an immediately preceding unittest in the test suite which used TBB's "scalable allocator" to allocate/deallocate a couple of GByte of stuff. It seems scalable allocator actually retains such allocations in it's own pool rather than returning them to the system (see e.g here or here). Became obvious once I ran tests individually with enough of a Sleep after them to observe their on-completion working set in task manager.

timday
  • 24,582
  • 12
  • 83
  • 135