1

I'm really out of my comfortzone when I have to monitor memory. But I'm the only one here, and I'm just left clueless: I have a Java8 application (CMS) on a Tomcat application-server running, and where in some trouble. After a while the server crashes.

After some research, I found out it is memory related, so I attached visualVM on my environment, and started monitoring.

I see that the memory is slowly filling up. Garbage Collection does it's job, but not thoroughly. It always leaves some more memory in the used heap. When I do a manual 'Perform Garbage Collection' in visual VM, the garbage collection is performed much better. (See screenshots)

Garbage collection performed by tomcat (first), Garbage performed manually (second downpeak)

It'll take several hours, but the used heap will grow larger and larger again after each garbage collection. The moment that I will manually preform GC again, the minimal used heap will be 'normal' again.

I have noticed that the heap fills with byte[]. Those will fill the most of the space. Someone could help me out on this?

Kornelito Benito
  • 1,067
  • 1
  • 17
  • 38
  • "Garbage Collection does it's job, but not thoroughly." It's not supposed to do a Full GC like it does when you force it. What's the stacktrace when the memory runs out? – Kayaman Mar 24 '17 at 13:00
  • Your graph doesn't necessarily imply, that you have a problem. I'd recommend to turn on your gc-log and check your gc stop times. Also from your graph, the app is down to <500MB after your manual full gc, but it maxes to >5GB, so you maybe want to turn it down to ~1.5GB. – slowy Mar 24 '17 at 13:00
  • @slowy Given the fact that the server is crashing; I agree with the OP: they do have a problem. – GhostCat Mar 24 '17 at 13:03
  • 1
    @GhostCat: Indeed, but the graph can be totally fine, lots of our heaps look like that without a problem... – slowy Mar 24 '17 at 13:07
  • 2
    The stacktrace would tell whether this is a memory leak or if the GC is not able to keep up with the garbage being accumulated. Tweaking the heap size or using a different GC algorithm could solve this instantly. – Kayaman Mar 24 '17 at 13:09
  • Do you have automatic deployment of new uploaded war files/web.xml files enabled? This sometimes causes leaks where the old version class files not get unloaded. – MTilsted Mar 24 '17 at 13:18
  • may be it will be better to check, why your application is taking so much memory. your code might not be doing things in optimized manner. And in case if it is due to huge size of data to be processed. You should increase the heap size. But a thorough analysis has to be done for your code first. – techprat Mar 24 '17 at 16:52
  • "After a while the server crashes." - you should provide more detail. What does constitute "crashing" in this context? An OOME in java? A hard JVM crash? The linux OOM killer? Does the entire server suddenly reboot? – the8472 Mar 24 '17 at 16:57
  • I get the same problem with a code that was using JNI. The program was a 24/7, the problem show only on big input at full load. The GC took too mush time to free memory, so the C++ part was hanging before memory was available. The solution was to REDUCE memory using the jvm options; more GC but smaller ones. (oracle-jvm-linux-64) – Galigator Mar 26 '17 at 21:28
  • Beyond that; you got a bunch of input; but neither accept nor upvotes. So you are not happy with the answers you got? – GhostCat Mar 28 '17 at 18:30

3 Answers3

1

I see that the memory is slowly filling up. Garbage Collection does it's job, but not thoroughly. It always leaves some more memory in the used heap. When I do a manual 'Perform Garbage Collection' in visual VM, the garbage collection is performed much better.

Full GC gets triggered when JVM feel its necessary (as its costly . For example Its stop the world for parallel GC . Similarly stop the world for two sub phases for concurrent mark sweep collector) . It depends on various factors like Xms and Xmx parameters see JVM heap parameters. So you should not be worried about until and unless you get out of mem exception as JVM will trigger when necessary

For server crash :- I can think of two problems

  1. Memory leak. In that case memory footprints will be increasing even after each GC
  2. May be you are constructing some cache without eviction algorithm if its near to full

If both does not apply, i see a usecase of increasing heap and give it a try

Community
  • 1
  • 1
M Sach
  • 33,416
  • 76
  • 221
  • 314
  • OP said "Java8 application (CMS)" so he possibly means a content management system (which is not really relevant to the question) and not the CMS collector – the8472 Mar 24 '17 at 16:55
  • Thanks for clarification. looks like i assumed CMS as concurrent collector in context of garbage collection context of question. Though it does not amkes difference in mine answer, still removed the CMS to avoid confusion :) – M Sach Mar 25 '17 at 01:39
1

I've had a few problems like this before. One was our app's fault, one was the app server's fault, and one I wasn't able to figure out but was able to mitigate.

In each case I used JProfiler to watch memory usage on a local server and ran a variety of happy-path and exception tests to try to figure out what was causing the problem. Doing this testing wasn't a quick and easy process - on average I spent about a week each time.

In the first case (our app's fault), I found that we were not closing SQL connections for a web service when exceptions were thrown. Testing the happy paths showed no problems, but when I started testing exceptions I could exhaust the server's memory with about 100 consecutive exceptions. Adding code to manually clean up resources in the exception handler solved the problem.

In the second case (WebSphere's fault), I verified that our app was closing all resources correctly, but the problem persisted. So I started reading through WebSphere documentation and found that it was a known issue with JAX-WS clients. Luckily there was a patch to WebSphere which fixed the problem.

In the third case (couldn't determine the cause), I was unable to find any reason why it was happening. So the problem was mitigated by increasing JVM memory allocation to an amount where the OOM exceptions would take greater than 1 week to happen, and configuring the servers to restart every weekend.

Kaleb Brasee
  • 51,193
  • 8
  • 108
  • 113
0

There might be some simple technical workarounds to mitigate the problem; like: simply adding more memory to the JVM and/or the underlying machine.

Or if you really can prove that running System.gc() manually helps (as the comments indicates that most people think: it will not) you could automate that step.

If any of that is good enough to prevent those crashes; you are buying yourself more time to work on a real solution.

Beyond that, a meta, non-technical perspective. Actually there are two options:

  • You turn to your manager; and you tell him that you will need anything beetween 2, 4, 8 weeks in order to learn this stuff; so that you can identify the root cause and get it fixed.
  • You turn to your manager; and you tell him that he should look for some external consulting service to come, identify the root cause and help fixing it.

In other words: your team / product is in need for some "expert" knowledge. Either you invest in building that knowledge internally; or you have to buy it from somewhere.

GhostCat
  • 137,827
  • 25
  • 176
  • 248
  • 1
    Oh come on. `System.gc()` as a scheduled service? It doesn't sound like they're crashing so often that something like that should be even considered. He could just as well give more memory to the server. – Kayaman Mar 24 '17 at 13:07
  • Lets put it that way: there might be various mitigation strategies, but you are right; I will reword to make that more clear. – GhostCat Mar 24 '17 at 13:11
  • It's a common best practice to disable explicit gc. In this case: He would have to do it every few minutes. With that heap size it will take a few seconds, if not dozens - pretty sure the situation will get worse... – slowy Mar 24 '17 at 13:12
  • Considering that he's already comparing the explicit GC in VisualVM and thinking that the GC isn't "doing its job", it's dangerous to give the impression that explicit GC is something normal to use when the real GC is "broken". – Kayaman Mar 24 '17 at 13:13
  • That is true. Reworded again. – GhostCat Mar 24 '17 at 13:16
  • @slowy "It's a common best practice to disable explicit gc." - common practice? yes. best practice? no. – the8472 Mar 24 '17 at 22:34
  • @the8472: Please explain? I never had the situation, where System.gc() was the solution, but rather the problem... – slowy Mar 28 '17 at 12:59
  • @slowy Well, in his **question** the OP states that when he runs GC manually, he sees improvements. That is why I put all the wording in there to make it clear that he should base his actions on actual, factual results. – GhostCat Mar 28 '17 at 13:02
  • @slowy some core APIs (NIO buffers and RMI) use explicit GCs. Using `ExplicitGCInvokesConcurrent` instead of outright disabling it can be a better approach. see http://stackoverflow.com/a/32912062/1362755 – the8472 Mar 28 '17 at 18:27
  • @the8472 So you are saying: my answer isn't too bad ... but then: could I do anything to make "upvote worthy" in your eyes? ;-) – GhostCat Mar 28 '17 at 18:29
  • no, I was just responding to @slowy to point out that *disabling* System.gc() may not be a good idea. That doesn't imply that one should use it. Not without carefully analyzing whether it is needed. And imo OP provided insufficient information to make that judgement. He's mentioning crashes, System.gc normally doesn't magically solve crashes. – the8472 Mar 28 '17 at 20:50