2

Rewording my question in an attempt to make it On-Topic:

We have a client (only one client out of many) that is consistently getting an Out of Memory exception with our software. I feel like we've eliminated the usual suspects that would cause this and am looking for ideas of what other things (less standard causes) that might cause an OOM. Specifically, since this seems to be specific to a single customer, could it be caused by something wrong in the hardware, OS, or .Net install?

Here are the things I am aware of that cause an OOM and why I believe we've eliminated them as suspects:

1 - OOM caused by system running out of memory. Why Not? Because the system has several GB available when these exceptions occur.

2 - OOM caused by process running out of memory due to over allocation or memory leaks. Why Not? Because the process is using only about 100MB of memory at the time of the exceptions. We have monitored the memory usage for days (on the system in question) and have not noticed any significant increase in memory usage.

3 - OOM caused by running out of other system resources such as file handles, etc. Why Not? The exceptions are happening, exclusively, during run-of-the-mill memory allocations, not while opening a file or connecting to a socket.

4 - OOM caused by attempting to allocate a large array with excessive memory fragmentation. Why Not? The memory blocks that we are allocating are fairly small (640x480x2, for the most part). With so much memory available, I have trouble believing that it could be so fragmented that something like that would fail.

So, just to be clear, I am not asking "Why doesn't my code run?" My code does run, on all machines but one. I'm not asking anyone to debug my code. My question is: "What other possible causes, besides those we've eliminated, could be resulting in an Out of Memory exception?" Or, "Am I missing something that could have caused me to eliminate one of the known causes prematurely?"

TrespassersW
  • 403
  • 7
  • 14
  • 1
    Parsing corrupt resources can throw OOM, E.g. Image.FromFile() would throw OutOfMemoryException when *"The file does not have a valid image format.*" – Alex K. Nov 10 '15 at 16:23
  • @Alex: Good point. I should clarify that I can see the call stack for where the OOMs are happening, and they happen during run-of-the-mill byte array allocations. I'll amend my post to be more clear about this. – TrespassersW Nov 10 '15 at 16:27
  • 1
    You talk about images. Are you creating Bitmap objects without disposing them? – Dan Byström Nov 10 '15 at 16:45
  • You are also running SQL Server on that machine. Did you check to see what the maximum memory usage of SQL is? That service will gladly use 100% of the available memory, starving your application. I'd limit the amount available for SQL to use at most 75% of the RAM. – Martin Soles Nov 10 '15 at 17:16
  • @DanByström: Good question. I've double checked all of our Disposable objects (we're pretty careful about that in general). This isn't acting like a memory leak. We've monitored memory usage for days and seen no evidence that it is creeping upward. Now, the system has started throwing the exceptions pretty much as soon as we start the app. – TrespassersW Nov 10 '15 at 17:49
  • @MartinSoles: Total system usage of memory is pretty low and there is still lots of RAM available when these errors happen. I'm pretty convinced that we aren't actually running out of memory. But I'm also running out of ideas about what else it might be. – TrespassersW Nov 10 '15 at 17:52
  • What is the size of the byte array is being allocated when OOM is thrown? – Ed Pavlov Nov 10 '15 at 20:07
  • Try checking a profiler (for example, for signs of heap fragmentation). Also, VMMap can be useful to see the virtual memory of your application. Note that `OutOfMemoryException` can be asynchronous, so the stack trace isn't always pointing at the real cause - your allocation might simply be the first one that happens at the time of the exception, for whatever reason. – Luaan Nov 11 '15 at 09:38
  • @Ed.ward: The arrays are typically int[640*480*2]. So, not very big. – TrespassersW Nov 12 '15 at 21:37
  • @Luaan: I have run it through CLRProfiler with no luck. I'll try VMMap, though. Maybe it will help. Could you explain your comment about the stack trace? I would have thought the stack trace always shows the trace of the stack of the calling thread that generated the exception. Does it not? – TrespassersW Nov 12 '15 at 21:39
  • 1
    @TrespassersW Not for asynchronous exceptions. For example, if someone calls `Thread.Abort` on your thread, the exception happens on your thread wherever you currently are in your code, but it's cause is on the thread that called `Thread.Abort`. It's not a very likely cause here, but it's possible. However, since your arrays *are* big enough to end up on the large object heap, the problem might be with LOH fragmentation - though that should show rather clearly on CLRProfiler, and as far as I know, it should be committed memory (*unless* your arrays stay zeroed for long enough). – Luaan Nov 12 '15 at 22:27
  • 1
    Regarding 1) Physical RAM is hardly related to OOM exceptions. You would need to run out of physical RAM and out of swap space. Regarding 2) How did you measure a 100 MB memory usage? .NET applications easily need more memory than that, so it's likely that you have measured the working set size, which is memory in physical RAM and not virtual memory which causes the OOM exception. – Thomas Weller Nov 12 '15 at 22:30
  • 2
    Regarding 3) IMHO you don't get OOM exception in that case, but you get NULL handles etc. I'd say it's more likely to get NullReferenceExceptions in that case. Regarding 4) Actually this is a likely option. You seem to have 600kB blocks, which would be a large object heap [(LOH) fragmentation](http://stackoverflow.com/questions/30361184/loh-fragmentation-2015-update), because the LOH is not compacted after a GC. – Thomas Weller Nov 12 '15 at 22:30
  • 3
    IMHO there is only one more option: SOH (small object heap) fragmentation due to [pinned objects](http://stackoverflow.com/questions/2490912/what-are-pinned-objects) which cannot be garbage collected. – Thomas Weller Nov 12 '15 at 22:30
  • 2
    When measuring, measure virtual memory and/or private bytes. Both, reserved and committed memory contribute to OOM. [Never measure working set size](http://stackoverflow.com/questions/31945443/peakvirtualmemorysize64-peakworkingset64-and-peakpagedmemorysize64-for-a-proces/31968807#31968807), it's one of the most useless metrics. I don't know why that's the default column in Task Manager. Use [SysInternals Process Explorer](https://technet.microsoft.com/en-us/sysinternals/processexplorer.aspx) instead. – Thomas Weller Nov 12 '15 at 22:31
  • 2
    [Take a crash dump](http://stackoverflow.com/questions/24874027/how-do-i-take-a-good-crash-dump-for-net) with correct bitness and full memory. Use a debugger to [track down .NET out of memory exceptions](http://stackoverflow.com/questions/26142607/how-to-use-windbg-to-track-down-net-out-of-memory-exceptions). – Thomas Weller Nov 12 '15 at 22:31
  • 1
    @ThomasWeller: Thank you for your thoughts. My point with #1 was that we're not anywhere close to actually running out of RAM or disk space. With #2, I was measuring private bytes. Virtual mem stays under 350 MB. That was a good thought, though. Way back in the day, I remember getting OOMs from running out of other resources, though you may be right that this no longer applies. Sounds like I need to look further into #4. Thank you for linking all of the good information. I'll read up. – TrespassersW Nov 12 '15 at 23:09
  • int[640*480*2] is enough to be allocated in Large Objects Heap (all objects greater then 80Kb allocated in LOH). So I would recommend double check LOH fragmentation, especially if app running under .NET 3.5. LOH compacting is available starting from .NET 4 only. – Ed Pavlov Nov 13 '15 at 09:00
  • Is the process running as a 64-bit executable? If the process is running as 32 bit, it can normally address 2GB of memory. However because of how the memory is managed by the .Net runtime, you can actually reach this limit when the application is using around 1GB (when the garbage collector runs, it may allocate large blocks to perform its job). This 2GB limit exists no matter how much memory the server has. (There is a switch to increase this limit to 3GB, but the times I've tried it makes the system unstable, so I do not recommend it). – BlueStrat Nov 19 '15 at 23:32

1 Answers1

1

As an FYI for anyone struggling with similar issues. I think we've finally hunted down the cause of this bug. Turns out the OpenGL drivers on certain cheaper on-board Intel graphics cards had a problem with the way we were writing bitmap data to the same texture ID over and over. I changed the code to delete the texture and allocate a new ID each time and the problem seems to have gone away.

TrespassersW
  • 403
  • 7
  • 14