4

We're load testing a Java 1.6 application in our DEV environment. The JVM heap allocation is 2Gb, -Xms2048m -Xmx2048m. Under load testing, the app runs smooth, never uses more than 1.25Gb of heap, and garbage collection is totally normal.

In our UAT environment, we run the load test with the same parameters, the only difference is the JVM, it's allocated 4Gb, -Xms4096m -Xmx4096m, otherwise, the hardware is exactly the same with DEV. But during load testing, the performance is horrendous, the app eats up nearly the entire heap, and garbage collection runs rampant.

We've run these tests over and over again, eliminated all possible symptoms that may influence performance, but the results are the same. Under what circumstances can this be the case?

raffian
  • 31,267
  • 26
  • 103
  • 174
  • 4
    What happens if you allocate 2G in the UAT environment? Have you properly qualified this setting as the only variable factor? – user207421 Aug 31 '12 at 03:12
  • Is the application going through the *same* activity in Dev and UAT? I would think UAT would have their own tests and scenarios. – Miserable Variable Aug 31 '12 at 05:54
  • @EJP We tried this, not in UAT, but in DEV. We bumped the heap space to 4GB, just like UAT, ran the tests again, and it performed less than expected, basically, the performance is better when the JVM is maxed at 2GB, not 4GB, and this is what puzzles us. – raffian Aug 31 '12 at 14:38
  • @ThorbjørnRavnAndersen We're not using that tool, Precise is what we use, but thanks for mentioning it, looks nice. – raffian Aug 31 '12 at 19:07
  • @RaffiM a good monitoring tool is invaluable - additionally familiarity helps you when you have a catastrophic situation. – Thorbjørn Ravn Andersen Aug 31 '12 at 19:25

3 Answers3

5

There is something different about your application in the Production and UAT environments.

Judging from the symptoms, it is (IMO) unlikely to be a hardware, operating system performance tuning or a difference in the JVM versions. It goes without saying that this is unlikely to be due to the application having more memory.

(It is not inconceivable that your application might do something strange ... like sizing some data structures based on the maximum heap size and get the calculations wrong. But I think you'd be aware of that possibility, so lets ignore it for now.)

It is probably related to a difference in the OS environment; e.g. a different version of the OS or some application, differences in the networking, differences in locales, etcetera. But the bottom line is that it is 99% certain that there is a memory leak in your application when run on the UAT, and that memory leak is what is chewing up heap memory and overloading the GC.

My advice would be to treat this as a storage leak problem, and use the standard tools / techniques to track down the cause of the problem. In the process, you will most likely be able to figure out why this only occurs on your UAT.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
1

The culprit could be garbage collection, normal "stop-the-world"-type collection caused us some performance problems; the server-software was running very slow, yet the load of the server was also low. Eventually we found out that there was a single "stop-the-world" -garbage collector thread holding up the entire software being run all the time under certain scenarios (operations producing loads of garbage).

Moving to concurrent garbage collection alleviated the problem with start up parameters -XX:+UseParallelOldGC -XX:ParallelGCThreads=8. We were using "only" 2gb heaps in tests and production, but it is also worthy of noting that the amount of time the GC takes goes up with larger heap (even if your software never actually uses all of it).

You might want to read more about different garbage collector -options and tuning from here: Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning.

Also, answers in this question could provide some help: Java very large heap sizes.

Community
  • 1
  • 1
esaj
  • 15,875
  • 5
  • 38
  • 52
  • 1
    I don't think the symptoms support this diagnosis. Specifically, it does not explain why the heap is full (4Gb of 4Gb) on the UAT, but running way below full (1.25Gb of 2Gb) in production. What is *using* that space? – Stephen C Aug 31 '12 at 04:50
  • @StephenC: I'm not a JVM-expert, but I do think the size of the heap affects things like period of old space collection at least in some collectors, and they don't reclaim all the 'dead' objects, just enough to keep going, so with larger heaps the software keeps using more of the heap before starting the collection (at which point some objects could have already tenured to old generation, taking more time in collection). – esaj Aug 31 '12 at 05:01
  • Well yes it does. But that doesn't explain why the heap is full in the large heap case and not full in the small heap case. Your diagnosis is ignoring one of the symptoms. Or to put it another way, it doesn't explain why there are so many objects in the old generation to start with. – Stephen C Aug 31 '12 at 05:49
  • @StephenC: You're missing my point (although I'm still not saying I'm right); if the collector tries to keep the collection time to minimum, it only collects as much as necessary, and since there's plenty of heap available, it is not run as often, so it could clean at ratio of 1 object per every 10 new allocations, so the heap usage keeps climbing, until it hits the max. Then it starts collecting 1 object per every 1 new allocation, keeping the heap at max, but satisfying memory requirements. With smaller heap, the collections are done more often, so the heap doesn't get to grow to maximum. – esaj Aug 31 '12 at 05:54
  • 1
    The OP is not talking about heap size. He is talking about heap occupancy. If you look at his question, you'll see that the heap size min and max are the same in both systems. When he says *"the app eats up nearly the entire heap"* he can only mean that it eats it up with non-garbage objects. Besides, the GC is not going to "run rampant" until the heap occupancy rises. All of those new objects will go into the new generation and will be GC'ed before they get to the old generation. His root problem is that the heap occupancy is too high ... not his choice of GC. – Stephen C Aug 31 '12 at 07:09
  • The other point is that even if stop-the-world GC was slowing his app down, it would not (by itself) explain the "garbage collection runs rampant" comment. That implies that the GC is running most of the time. But even with stop the world, you'd expect 1) still mostly quick young generation collections, 2) full collections only occasionally, and 3) lots of time for the application to do stuff in between. – Stephen C Aug 31 '12 at 07:17
  • And finally, the copying collector used for the young generation (at least) is good at dealing with garbage. In fact, the cost of the dealing with a garbage object is essentially the cost of zeroing it. So if the same number of non garbage objects exist in the small and large memory cases, you would expect the large memory GC to be able to work MORE efficiently, not less. – Stephen C Aug 31 '12 at 07:21
  • So basically, while your fix might make the GC run faster for a bit, it doesn't explain the heap occupancy difference or address the problem that is causing that ... and that will most likely kill the JVM irrespective of the GC tuning settings. – Stephen C Aug 31 '12 at 07:27
  • And on this point: "If the collector tries to keep the collection time to minimum, it only collects as much as necessary ...". The classical collectors don't work that way. Once they decide to run they deal with the entire space they are scheduled to do. And the trigger is that the space is full. – Stephen C Aug 31 '12 at 07:59
  • @esaj Thanks for your suggestions. We took a hard look at the GC and reconfigured it with more deterministic settings. I forgot to mention, we're using BEA's JRockit JVM, but regardless, the changes to the GC reduced collection spikes and stabilized heap usage. – raffian Aug 31 '12 at 18:47
1

It will be worth while to analyze the heap dumps on both these machines and understand what is consuming the heap differently on these 2 environments. Histograms will help.

Ashutosh
  • 11
  • 1