C++: Measuring memory usage from within the program, Windows and Linux

Question

See, I wanted to measure memory usage of my C++ program. From inside the program, without profilers or process viewers, etc.

Why from inside the program?

Measurements will be done thousands of times—must be automated; therefore, having an eye on Task Manager, top, whatever, will not do
Measurements are to be done during production runs—performance degradation, which may be caused by profilers, is not acceptable since the run times are non-negligible already (several hours for large problem instances)

note. Why measure at all? The only reason to measure used memory (as reported by the OS) as opposed to calculating “expected” usage in advance is the fact that I can not directly, analytically “sizeof” how much does my principal data structure use. The structure itself is

unordered_map<bitset, map<uint16_t, int64_t> >

these are packed into a vector for all I care (a list would actually suffice as well, I only ever need to access the “neighbouring” structures; without details on memory usage, I can hardly decide which to choose)

vector< unordered_map<bitset, map<uint16_t, int64_t> > >

so if anybody knows how to “sizeof” the memory occupied by such a structure, that would also solve the issue (though I'd probably have to fork the question or something).

Environment: It may be assumed that the program runs all alone on the given machine (along with the OS, etc. of course; either a PC or a supercomputer's node); it is certain to be the only one program requiring large (say > 512 MiB) amounts of memory—computational experiment environment. The program is either run on my home PC (16GiB RAM; Windows 7 or Linux Mint 18.1) or the institution supercomputer's node (circa 100GiB RAM, CentOS 7), and the program may want to consume all that RAM. Note that the supercomputer effectively prohibits disk swapping of user processes, and my home PC has a smallish page file.

Memory usage pattern. The program can be said to sequentially fill a sort of table, each row wherein is the vector<...> as specified above. Say the prime data structure is called supp. Then, for each integer k, to fill supp[k], the data from supp[k-1] is required. As supp[k] is filled it is used to initialize supp[k+1]. Thus, at each time, this, prev, and next “table rows” must be readily accessible. After the table is filled, the program does a relatively quick (compared with “initializing” and filling the table), non-exhaustive search in the table, through which a solution is obtained. Note that the memory is only allocated through the STL containers, I never explicitly new() or malloc() myself.

Questions. Wishful thinking.

What is the appropriate way to measure total memory usage (including swapped to disk) of a process from inside its source code (one for Windows, one for Linux)?
Should probably be another question, or rather a good googling session, but still---what is the proper (or just easy) way to explicitly control (say encourage or discourage) swapping to disk? A pointer to an authoritative book on the subject would be very welcome. Again, forgive my ignorance, I'd like a means to say something on the lines of “NEVER swap supp” or “swap supp[10]”; then, when I need it, “unswap supp[10]”—all from the program's code. I thought I'd have to resolve to serialize the data structures and explicitly store them as a binary file, then reverse the transformation.

On Linux, it appeared the easiest to just catch the heap pointers through sbrk(0), cast them as 64-bit unsigned integers, and compute the difference after the memory gets allocated, and this approach produced plausible results (did not do more rigorous tests yet).

edit 5. Removed reference to HeapAlloc wrangling—irrelevant.

edit 4. Windows solution This bit of code reports the working set that matches the one in Task Manager; that's about all I wanted—tested on Windows 10 x64 (tested by allocations like new uint8_t[1024*1024], or rather, new uint8_t[1ULL << howMuch], not in my “production” yet ). On Linux, I'd try getrusage or something to get the equivalent. The principal element is GetProcessMemoryInfo, as suggested by @IInspectable and @conio

#include<Windows.h>
#include<Psapi.h>
//get the handle to this process
auto myHandle = GetCurrentProcess();
//to fill in the process' memory usage details
PROCESS_MEMORY_COUNTERS pmc;
//return the usage (bytes), if I may
if (GetProcessMemoryInfo(myHandle, &pmc, sizeof(pmc)))
    return(pmc.WorkingSetSize);
else
    return 0;

edit 5. Removed reference to GetProcessWorkingSetSize as irrelevant. Thanks @conio.

Virtual memory is pretty complex. What particular statistic do you wish to measure? — David Heffernan, Jan 24 '17 at 14:19
I want the nearest to RAM usage as would be reported by Task Manager “total physical memory reserved for an individual proces” or top. I do not intend to crawl into virtual memory, the usage should be reported for RAM. — yvs314, Jan 24 '17 at 14:40
For Windows, have a look at [Process Memory Usage Information](https://msdn.microsoft.com/en-us/library/windows/desktop/ms684879.aspx). Besides, your program doesn't consume RAM. It consumes address space. RAM is just a performance optimization. — IInspectable, Jan 24 '17 at 14:44
If you pretend that the complexity of virtual memory does not exist then I doubt you'll get much useful. What are you going to do with the information. Which decisions will it inform? — David Heffernan, Jan 24 '17 at 14:48
Memory usage determines whether a given problem instance can be solved or not (the actual relevant statistic would be average memory usage per state obtained by dividing memory usage by the number of states). It also could affect the choice of the program's inner structure (I could not calculate memory usage of `std::unordered_map<...>` “analytically” so I decided to go empiric. — yvs314, Jan 24 '17 at 15:06
I believe that you will need a better understanding of how the virtual memory system works to do this well — David Heffernan, Jan 24 '17 at 15:13
Well that's a pity; thanks for the suggestion though. Probably my next station will be IInspectable's suggestion. All I wanted was to avoid the need to keep my eye on the Task Manager to note the "Memory usage" peak---that's not too reproducible a practice. — yvs314, Jan 24 '17 at 15:26
You can certainly read this information out of performance counters programmatically, but all I am saying is that you should be aware that memory statistics are complex. Memory can be shared between different processes. Reserved but not committed. Committed, but swapped out because it is not in use. — David Heffernan, Jan 24 '17 at 15:39
What are you trying to achieve by allocating a zero-byte block? The documentation for HeapAlloc doesn't say anything about this, I don't think the result means anything. Also, IIRC, not all versions of Visual Studio allocate memory from the default heap. — Harry Johnston, Jan 24 '17 at 22:37
@HarryJohnston: The documentation totally does say something about it: ["You should not .. assume any relationship between two areas of memory allocated by `HeapAlloc`."](https://msdn.microsoft.com/en-us/library/windows/desktop/aa366711(v=vs.85).aspx) — Ben Voigt, Jan 25 '17 at 22:33
@HarryJohnston: That line of documentation seems very clear that the approach stated in the question of "catch the heap pointers, cast them as 64-bit unsigned integers, and compute the difference" will not be meaningful. — Ben Voigt, Jan 25 '17 at 23:13
@BenVoigt: fair enough. To be honest, I was just covering myself in the event that there was some other piece of documentation I'd overlooked which *did* say that a zero sized block was a special case. But I'm pretty sure there isn't. :-) — Harry Johnston, Jan 25 '17 at 23:18
@HarryJohnston: Well, for `sbrk` zero is not a special case, it's explicitly called out in the documentation because people might otherwise assume it's invalid. The code in the question would have worked just as well using `sbrk(32)`, since *all* calls to `sbrk` return the (previous) value of the edge-of-heap pointer. — Ben Voigt, Jan 25 '17 at 23:21
@yvs314: You recent edits suggest ever more strongly that you're asking the wrong question and don't understand some crucial points. First, `GetProcessWorkingSetSize` gives you the minimum and maximum working sets for a process, not the current. The function description is a single sentence that takes less than half a line and says precisely that. Second, what makes you think your process is terminated because of *physical* memory exhaustion? What gave you that idea? It's far more likely that you're hitting the *commit* limit (and unless you don't have a page file, those are different things). — conio, Jan 27 '17 at 00:36
And third: You omitted the most important point about the memory usage pattern of your program. Does it *sequentially* fill the table and doesn't access filled cells until the search? If so, and assuming the computation and filling is relatively heavy as you imply, then old, unused, unaccessed cells will be written to disk and the physical memory the occupy will be happily available to new cells. Depending on how fast you fill the table, there may be enough free+standby pages to fulfill allocations fast enough, and even if not, until you hit the commit limit, allocations will still succeed. — conio, Jan 27 '17 at 00:36
@conio: Thank you for your comments, yes, I am _that_ ignorant (and hoped to remain so; turns out, it wouldn't do—pity). The filling pattern will be edited in shortly—with other additions it gets too long for the comment. — yvs314, Jan 27 '17 at 09:43
If your environment prohibits (technically, conventionally, or otherwise) the use of swap/page file looking at the physical memory usage makes a little more sense, but that only diverts the issue to whether or not this is a reasonable limitation in your case. If you don't have a choice then that's that. But given that you use only Linux on the supercomputer where the limit is exogenous, and use Windows only at home where you do have some measure of control, you might want to consider allowing swap, especially since you have a lot less physical RAM available. :) — conio, Jan 29 '17 at 16:55
@conio: Good point; however, I could only reasonably swap if I could directly restrict what is swapped and what is not. Imagine I'd like to fill the table with `n` rows total. Here I am filling the row `k` (`k<=n`). I need row `k-1` to do it. To fill a cell in row `k`, I need to access at most `n` values of the row `k-1` and do at most, say, 5 integer arithmetic operations plus 1 comparison. If the row `k-1` is swapped, then, in addition to waiting for circa `6*n` “fast” operations, I'd have to wait for `n` HDD accesses instead of `n` RAM/cache accesses. Slow enough? — yvs314, Jan 31 '17 at 12:08
First, you can control swap of certain regions of memory. I intentionally did not address that earlier, because it's - once again - asking the wrong question. (I actually wrote a comment and didn't send it.) You can more or less use [`VirtualLock`](https://msdn.microsoft.com/en-us/library/windows/desktop/aa366895(v=vs.85).aspx) to prevent certain pages from being written into the pagefile, but [there](http://stackoverflow.com/questions/9157883/what-is-the-difference-between-committing-and-locking-virtual-memory/9157914#comment11516994_9157914) are... — conio, Feb 01 '17 at 23:47
[lots](http://stackoverflow.com/questions/7874281/unexpected-page-handling-also-virtuallock-no-op) of [caveats](https://blogs.msdn.microsoft.com/oldnewthing/20140207-00/?p=1833). If you want to make sure pages stay resident in physical memory you can use [AWE](https://msdn.microsoft.com/en-us/library/windows/desktop/aa366527(v=vs.85).aspx) but this will mess up your code, and require you to handle paging manually de facto. — conio, Feb 01 '17 at 23:48
Second, and a lot more important, this is all pointless nonsense. You concocted a counterfactual hypothetical about a situation that never happens. Any reasonable general purpose page swapping algorithm uses a variation on the idea that recently used ("hot") data is kept in memory and unused ("cold") data is swapped out. You're refraining from programming the way you should be because you fear the OS deliberately works against you and is swapping out exactly that little piece of memory you need, that's also the most recently used one - the previous row in the table. — conio, Feb 01 '17 at 23:49
That's insane. Forget the fact that simply not the way Windows works in reality. That's not a reasonable concern regarding any operating system. If we're in the realm of hostile operating systems that actively work against us and we're letting ourselves assume whatever we want, why not assume that Windows pages out your executable code every time the scheduler is invoked and is forced to page in your code every couple (hundred) instructions? — conio, Feb 01 '17 at 23:50
Forget paging. I say you can't run your program on Windows at all! "Imagine" Windows works that way... "If the *code* is swapped, then, in addition to waiting for circa `6*n` “fast” operations, I'd have to wait for `n` HDD accesses instead of `n` RAM/cache accesses. Slow enough?" Unreasonable assumptions lead unreasonable conclusions. Crazy assumptions lead to crazy conclusions. — conio, Feb 01 '17 at 23:50

score 2 · Answer 1 · answered Jan 25 '17 at 22:04

On Windows, the GlobalMemoryStatusEx function gives you useful information both about your process and the whole system.

Based on this table you might want to look at MEMORYSTATUSEX.ullAvailPhys to answer "Am I getting close to hitting swapping overhead?" and changes in (MEMORYSTATUSEX.ullTotalVirtual – MEMORYSTATUSEX.ullAvailVirtual) to answer "How much RAM is my process allocating?"

score 1 · Accepted Answer · edited May 23 '17 at 12:01

1

To know how much physical memory your process takes you need to query the process working set or, more likely, the private working set. The working set is (more or less) the amount of physical pages in RAM your process uses. Private working set excludes shared memory.

See

for terminology and a little bit more details.

There are performance counters for both metrics.

(You can also use QueryWorkingSet(Ex) and calculate that on your own, but that's just nasty in my opinion. You can get the (non-private) working set with GetProcessMemoryInfo.)

But the more interesting question is whether or not this helps your program to make useful decisions. If nobody's asking for memory or using it, the mere fact that you're using most of the physical memory is of no interest. Or are you worried about your program alone using too much memory?

You haven't said anything about the algorithms it employs or its memory usage patterns. If it uses lots of memory, but does this mostly sequentially, and comes back to old memory relatively rarely it might not be a problem. Windows writes "old" pages to disk eagerly, before paging out resident pages is completely necessary to supply demand for physical memory. If everything goes well, reusing these already written to disk pages for something else is really cheap.

If your real concern is memory thrashing ("virtual memory will be of no use due to swapping overhead"), then this is what you should be looking for, rather than trying to infer (or guess...) that from the amount of physical memory used. A more useful metric would be page faults per unit of time. It just so happens that there are performance counters for this too. See, for example Evaluating Memory and Cache Usage.

I suspect this to be a better metric to base your decision on.

edited May 23 '17 at 12:01

Community

1
1

answered Jan 25 '17 at 16:45

conio

3,681
1
20
34

I do believe he's worried about his program alone exhausting memory, based on "The program itself surely loves its RAM (dynamic programming, lots of states to store), it will gladly chow through several GiB on certain problem instances.... If the problem eats through all available RAM, it is to terminate---virtual memory will be of no use due to swapping overhead." – Ben Voigt Jan 25 '17 at 21:58
I don't disagree, but my answer - the suggestion to monitor page faults rather than memory usage - holds regardless of whether he's the only one on the machine or he cares about other programs being able to work too. – conio Jan 25 '17 at 22:29
Agreed that page faults are the problem, but rate (faults per time) may not be all that helpful, because thrashing and I/O contention. Measuring (faults per computation) would be much better, because it isn't defeated by the slowdown caused by the faults. – Ben Voigt Jan 25 '17 at 22:33
So the answer is, probably, Private Working Set; in view of the environment, I assume shared to be negligible (although I'd prefer an upper bound—the whole Working Set, just in case). Now, my real main concern **is** the actual memory usage; also, as far as I bothered to test, the program tends to be killed by the system as soon as RAM is exhausted anyway. That suits me, however, I will be really glad to know how to catch when does it happen to report it in the program's own log. – yvs314 Jan 26 '17 at 12:37
@BenVoigt: In principle you're right. In practice, I think *before* you get that much PFs that you can't even execute code that causes PFs, you'll cause a rise in PFs. Per second. Tools like Process Explorer and Process Hacker show "PF Delta", but I'm not aware of a performance counter that gives you the number of PFs up to this point (rather than rate). Process Hacker is open source so you can find the undocumented uses of `NtQueryInformationProcess` (with `ProcessVmCounters`) and `NtQuerySystemInformation` (with `NtQuerySystemInformation`) to get the process and system PF count respectively. – conio Jan 26 '17 at 13:27
@yvs314: How to catch your process being terminated because of insufficient resources should really be a separate question, but the solution would be along the lines of setting an unhandled exception filter (see `SetUnhandledExceptionFilter`) or using WER (or perhaps something like Google Breakpad). – conio Jan 26 '17 at 13:35

C++: Measuring memory usage from within the program, Windows and Linux

2 Answers2

Linked