These kinds of problems are exceedingly difficult to diagnose. It is quite possible that what is happening is not the result of a single condition that triggers the behaviour, but a set of simultaneous conditions.
Here's what we know:
No cumulative problem indicated: If the problem was cumulative, we would expect to see some sign of that of the 20 day period leading up to the event. This does not mean that the preceding operation can be ignored. It is possible that some of the conditions that trigger the behaviour are staged, and start earlier on. This is something we cannot know with the information available.
Heaps are stable: The Private Bytes measure tells us how much memory has been reserved (not touched, as stephbu suggested). Bytes-in-all-Heaps tells us how much of the reserved memory is current allocated according to the memory manager (GC). Since both of these are stable, it would seem that the problem isn't necessarily a memory leak. The danger is that we only have 10 seconds of interesting data, and since GC is usually fairly passive, it isn't clear how accurate those statistics would be (particular with the wonky working set).
Working set indicates thrashing: The Working Set tells us how much physical memory the OS wants to keep paged-in to ensure reasonable performance. A growing working set indicates thrashing. A growing working set is normally associated with two things:
Increased object longevity is not indicated, because the heaps are not showing growth. Increased allocation rate is possible, but the objects are still short-lived (since a leak is not indicated).
These observations suggest to me that some rare event (or set of events) is triggering a condition in which there is:
a high allocation rate
of moderately large objects
which are not very long-lived
GC is thrashing as a result
There are other reports of these conditions causing OutOfMemoryExceptions. I'm not all that certain why it happens. If you are running a 32-bit environment, then a possible reason is fragmentation of the address space. This can happen if the GC cannot obtain contiguous pages from the OS.
Another possibility (which I cannot verify) is that the GC requests the OS to not page-out parts of the heap it is working on. If the number of locked pages gets high, an out-of-memory might result. This idea is almost total speculation as I do not know enough about the internals of Microsofts GC implementation.
I don't have any better explanations right now, but I'd definitely like a better explanation if anyone can provide one.
Finally, you might want to verify that a reasonable Latency Mode is enabled. If this was the problem, I think we would have seen an escalation of Bytes-in-all-Heaps -- so it's probably ok.
PS
Can you check what variable is indicated by the dashed line in the second chart? If it is processor use, then it is consistent with thrashing. As the need to page-in content more frequently increases, disk IO should increase, and (at a certain point) percentage processor use should decline, because everything is waiting for the disk. This is just an extra detail -- if the processor use doesn't decline excessively, thrashing is still a posibility. This is because parts of the software might still exhibit good locality and be able to make progress.