6

If I use the following call in C++, I would expect the WorkingSet of the process to never drop below 100MB.

However, the OS still trims the working set back to 16MB, even if I make this call.

Setting WorkingSet to 100MB would lead to a dramatic increase in my application speed, by eliminating soft page page faults (see the diagram below).

What am I doing wrong?

SIZE_T workingSetSizeMB = 100;
int errorCode = SetProcessWorkingSetSizeEx(
    GetCurrentProcess(),
    (workingSetSizeMB - 1) * 1024 * 1024), // dwMinimumWorkingSetSize
    workingSetSizeMB * 1024 * 1024,  // dwMaximumWorkingSetSize,
    QUOTA_LIMITS_HARDWS_MIN_ENABLE | QUOTA_LIMITS_HARDWS_MAX_DISABLE
  );
// errorCode returns 1, so the call worked.

(extra for experts) Experimental Methodology

I wrote a test C++ project to allocate 100MB of data to bring the WorkingSet over 100MB (as viewed within Process Explorer), then deallocated that memory. However, the OS trimed the WorkingSet back to 16MB as soon as I deallocated that memory. I can provide the test C++ project I used if you wish.

Why is Windows providing a call to SetProcessWorkingSetSizeEx() if it doesn't appear to work? I must be doing something wrong.

The diagram below shows the dramatic increase in the number of soft page faults (the red spikes) when the green line (the working set) dropped from 50MB to 30MB.

Example showing the increase in soft page faults when the WorkingSet is reduced too low

Update

In the end, we ended up ignoring the problem, as it didn't impact performance that much.

More importantly, SetProcessWorkingSetSizeEx does not control the current WorkingSet, and is not related in any way to soft page faults. All it does is prevent hard page faults, by preventing the current WorkingSet being paged out to the hard drive.

In other words, if one wants to reduce soft page faults, SetProcessWorkingSetSizeEx has absolutely no effect, as it refers to hard page faults.

There is a great writeup in "Windows via C/C++" (Richter) which how Windows deals with memory.

Contango
  • 76,540
  • 58
  • 260
  • 305
  • 1
    How do you expect the OS to keep more pages in memory than you have allocated? – James McNellis Sep 01 '12 at 13:55
  • @James McNellis If the OS could keep more pages in the WorkingSet, then the number of page faults would decrease, which would dramatically speed up the program. We have 16GB free RAM on the server, and it wouldn't hurt to have 1GB permanently allocated to this process (which would reduce the number of soft page faults to 0). – Contango Sep 01 '12 at 13:56
  • @Gravitas: How do you allocate and access the memory? – Yakov Galka Sep 01 '12 at 13:57
  • 3
    Why are you deallocating memory then? – Mat Sep 01 '12 at 13:57
  • @ybungalobill Using LocalAlloc(). – Contango Sep 01 '12 at 13:58
  • @Mat We use a C++ program that has automatic garbage collection. If any memory is deallocated, it will be allocated again within seconds, so it makes no sense for the OS to keep trimming the size of the WorkingSet back, only generate soft page faults when it needs to allocate that memory again. – Contango Sep 01 '12 at 13:59
  • 4
    If your garbage collector is affecting your performance so much, time to review your allocation policies. – Mat Sep 01 '12 at 14:00
  • @Gravitas: Then stop using a garbage collector. – Yakov Galka Sep 01 '12 at 14:01
  • @Mat Unfortunately, we are using a 3rd party garbage collection plugin, so we can't really change that. The program itself is 160,000 of code, so to replace our GC primitives would be a monumental task. – Contango Sep 01 '12 at 14:01
  • @ybungalobill Unfortunately, the program is 160,000 lines of code, so it would be a good year of work to replace the GC class with something different. – Contango Sep 01 '12 at 14:03
  • 2
    Re-using objects (pooling objects) can (sometimes) be done. Just because there's a GC doesn't mean you can't control when you allocate and let go of objects. – Mat Sep 01 '12 at 14:03
  • Can't you change the allocator used by the GC to one that doesn't actually deallocate the memory unless the system goes under memory pressure? – Matteo Italia Sep 01 '12 at 14:05
  • Unfortunately, there is no way to change anything to do with the GC. Its a 3rd party one, and we don't have the source code for it. – Contango Sep 01 '12 at 14:07
  • If we could force the WorkingSet to be higher, then this would completely eliminate the issue of soft page faults hurting app performance. – Contango Sep 01 '12 at 14:11
  • ..as would stopping doing so many deallocations. If you're continually allocating/deallocating objects/structs, can you not just pool them, as suggested by Mat? Apart from the memory use issue, dequeuing/enqueueing an object/struct pointer is faster than calling a 'classic' memory-manager and continually running ctors/dtors. – Martin James Sep 01 '12 at 15:23
  • I guess I'm looking for that one line of code that would dramatically up the performance of a 160,000 line program, especially on the latency side. Unfortunately, I can't change the GC (its 3rd party). For now, I'm very interested in methods of locking the WorkingSet to a user defined level. – Contango Sep 01 '12 at 19:11
  • How do you know that these soft faults (specifically demand-zero faults) are having any noticeable impact on your performance? – arx Sep 01 '12 at 20:29
  • @arx We are using Microsoft Concurrency Visualizer. We can capture a trace of the program on the server, and look at the stack trace when soft page fault interrupts occur. Every time there is a soft page fault, it pauses the entire thread for a couple of milliseconds, which is badly hitting our latency and throughput stats (see the diagram above). In addition, it also makes the entire process a lot less deterministic and predictable than would be optimal. – Contango Sep 02 '12 at 01:48
  • 1
    @Gravitas: You're missing the point by miles. The working set is allocated blocks of memory that are currently in RAM, and not something you can directly control. Yes, you can allocate a block to get your working set size up to a certain value but as soon as that is released, it's gone, and no longer available, or part of the workgin set. It doesn't magically make the OS page something else back in or keep it for your process. To stop memory being swapped out, don;t release them or keep using them. – Deanna Sep 03 '12 at 08:48
  • @Deanna You are absolutely right. I've updated the answer to reflect this. I had to read "Windows via C/C++" (Richter) to work out how PC memory architecture actually works. – Contango Sep 21 '12 at 11:40

1 Answers1

4

Page faults are cheap and are to be expected. Real-time applications, high-end games, high-intensity processing and BluRay playback all happily work at full-speed with page-faults. Page faults are not the reason your application is slow.

To find out why your application is slow, you need to do some application profiling of your application.

To specifically answer your question - the page faults that are occurring when you've just had a GC.Collect() aren't page-in faults, they're demand-zeroed page faults caused by the fact that the GC has just allocated a new huge block of demand-zeroed pages to move your objects to. Demand zero pages aren't serviced from your pagefile and incur no disk cost, but they are still page-faults, hence why they show on your graph.

As a general rule, Windows is better at managing your system resources than you are, and it's defaults are highly tuned for the average case of normal programs. It is quite clear from your example that you are using a garbage collector, and hence you've already offloaded the task of dealing with working sets and virtual memory and so on to the GC implementation. If SetProcessWorkingSetSize was a good API call to improve GC performance, the GC implementation would do it.

My advice to you is to profile your app. The main cause of slowdown in managed applications is writing bad managed code - not the GC slowing you down. Improve the big-O performance of your algorithms, offload expensive work through the use of things like Future and BackgroundWorker and try to avoid doing synchronous requests to the network - but above all, the key to getting your app fast is to profile it.

SecurityMatt
  • 6,593
  • 1
  • 22
  • 28
  • We used Microsoft Concurrency Visualizer to get an insight into whats going on in the app. In the end, the performance loss wasn't sufficient to worry about. – Contango Sep 21 '12 at 11:35