I have a program that is constantly running. Normally, it seems to garbage collect, and remain under about 8MB of memory usage. However, every weekend, it refuses to garbage collect unless I make an explicit call to it. However, if it nears the maximum heap size, it will still garbage collect. However the only reason this issue was noticed, is because it actually crashed from running out of memory on one weekend i.e. it must have reached the maximum heap size, and not run the garbage collector.
The following image (click to see) is a graph of the program's memory usage over a day. On the sides of the graph, you can see the normal behaviour of the program's memory usage, but the first large peak is what seems to start over the weekend. This particular graph is a strange example, because after I made an explicit call to the garbage collector, it ran successfully, but then it went and climbed back to the maximum heap size and successfully garbage collected on it's own twice.
What is going on here?
EDIT:
Ok, from the comments, it seems I haven't provided enough information. The program simply receives a stream of UDP packets, which are placed in a queue (set to have a maximum size of 1000 objects), which are then processed to have their data stored in a database. On average, it gets about 80 packets per second, but can peak to 150. It's running under Windows Server 2008.
The thing is, this activity is fairly consistent, and if anything, at the time that the memory usage starts it's steady climb, the activity should be lower, not higher. Mind you, the graph I posted above is the only one I have that extends back that far, since I only changed the Java Visual VM wrapper to keep graph data back far enough to see it this week, so I have no idea if it's exactly the same time every week, because I can't watch it over the weekend, as it's on a private network, and I'm not at work on the weekend.
Here is a graph of the next day:
This is pretty much what the memory usage looks like every other day of the week. The program is never restarted, and we only tell it to garbage collect on a Monday morning because of this issue. One week we tried restarting it on a Friday afternoon, and it still started climbing sometime over the weekend, so the time that we restart it doesn't seem to have anything to do with the memory usage next week.
The fact that it successfully garbage collects all those objects when we tell it to implies to me that the objects are collectable, it's just not doing it until it reaches the maximum heap size, or we explicitly call the garbage collector. A heap dump doesn't tell us anything, because when we try to perform one, it suddenly runs the garbage collector, and then outputs a heap dump, which of course looks perfectly normal at this point.
So I suppose I have two questions: Why is it suddenly not garbage collecting the way it does the rest of the week, and why is it that on one occassion, the garbage collection that occurs when it reaches the maximum heap size was unable to collect all those objects (i.e. why would there be references to so many objects that one time, when every other time there must not be)?
UPDATE:
This morning has been an interesting one. As I mentioned in the comments, the program is running on a client's system. Our contact in the client organisation reports that at 1am, this program failed, and he had to restart it manually when he got into work this morning, and that once again, the server time was incorrect. This is an issue we've had with them in the past, but until now, the issue never seemed to be related.
Looking through the logs that our program produces, we can deduce the following information:
- At 01:00, the server has somehow resynced it's time, setting it to 00:28.
- At 00:45 (according to the new, incorrect server time), one of the message processing threads in the program has thrown an out of memory error.
- However, the other message processing thread (there are two types of messages we receive, they are processed slightly differently, but they are both constantly coming in), continues to run, and as usual, the memory usage continues to climb with no garbage collection (as seen from the graphs we have been recording, once again).
- At 00:56, the logs stop, until about 7am when the program was restarted by our client. However, the memory usage graph, for this time, was still steadily increasing.
Unfortunately, because of the change in server time, this makes the times on our memory usage graph unreliable. However, it seems to be that it tried to garbage collect, failed, increased the heap space to the maximum available size, and killed that thread all at once. Now that the maximum heap space has increased, it's happy to use all of it without performing a major garbage collection.
So now I ask this: if the server time changes suddenly like it did, can that cause a problem with the garbage collection process?