5

I will try in brief to explain the problem.I work in supply chain domain where we deal with items/products and SKUs.

Say, my entire problem set is 1 million SKUs and I am running an algorithm. Now, my JVM heap size is given as 4 GB.

I can't process all the SKUs in one shot as I will need lot more memory. So, I divide the problem set into smaller batches. Each batch will have all the related SKUs which need to be processed together.

Now, I run several iterations to process the entire data set. Say, if each batch holds approx. 5000 SKUs, I will have 200 iterations/loops. All data for the 5000 SKUs is required till the batch has completed processing. But when the next batch starts, the previous' batch data is not required and hence can be garbage collected.

This is the problem background. Now, coming to the particular performance issue due to GC - Each batch is taking approx 2-3 secs to finish. Now, within this time, GC is unable to free up any objects as all the data is required till the end of processing a particular batch.So, GC is moving all these objects to old Gen (If I look at yourkit profiler, there is hardly anything in the new Gen). So, old gen is growing faster and full GC is needed which is making my program very slow. Is there any way to tune the GC in such a case or may be change my code to do the memory allocation in a different way?

PS - if each batch is very small, I don't see this issue. I believe this is because the GC is able to free up objects quick enough as the batch completes faster and hence not needed to move objects in the old gen.

Ravindra babu
  • 37,698
  • 11
  • 250
  • 211
Shiladitya
  • 133
  • 2
  • 11

5 Answers5

3

First Google hit indicates that you can use -XX:NewRatio to set a larger new generation size relative to the old generation.

Yosef Weiner
  • 5,432
  • 1
  • 24
  • 37
  • 1
    Don't you think increasing new generation size will increase collection time of new gen. If you increase new gen size much larger then you are treating new gen as old gen. It will take similar GC time as compared to old gen GC time. IMO any GC policy would not help much in this case. Please correct if I am wrong. – Nachiket Kate May 19 '16 at 09:35
  • 1
    @nachiketkate good point. I'm certainly not an expert but my understanding was always that [it is common knowledge that it is more efficient to GC young than old generations](http://www.javaspecialists.eu/archive/Issue115.html). But perhaps that comparison is unfair, assuming that the young gen is smaller than old. – Yosef Weiner May 19 '16 at 10:28
1

You need to adjust the -XX:NewRatio as mentioned in the other answer.

You can start with setting this -XX:NewRatio=1 which means your Old gen and young gen divide the available heap memory equally.

More details on how this flag works along with other memory adjustment flags: https://docs.oracle.com/cd/E19900-01/819-4742/abeik/index.html

Vijay
  • 542
  • 4
  • 15
1

Consider using object pool pattern.

I.e. create a pull of 5000 SKUs, then for each batch initialize each of these objects with new data. In this way you will not have any problems with GC as the pull is all what you need to allocate.

Argb32
  • 1,365
  • 8
  • 10
0

Few tips:

  1. Check for memory leaks with profiling tools like visualvm or MAT
  2. If you don't have memory leaks, check current memory is enough or not. If not, allocate enough memory.
  3. From your problem statement, oldGen is growing and it's causing FullGC. You did not quote the garbage collector you are using. Since you are using memory >= 4GB, you should try G1GC alrogithm. in G1GC, you can keep most of the default values except configuring key parameters like pause time goal, region size, parallel gc threads etc.

Refer to this SE question for more details:

Java 7 (JDK 7) garbage collection and documentation on G1

Community
  • 1
  • 1
Ravindra babu
  • 37,698
  • 11
  • 250
  • 211
0

I know this is little late but still ..

I played around a lot with JVM GC options which helped to a little extent. Good thing is I learned a lot more about GC in the process :)

Finally, I did some sort of object pooling. Since the job is processed in batches and each batch is roughly the same size and uses same number of objects, I created a pool of objects which was recycled every batch instead of creating and destroying the objects every batch. At the end of every batch, I am just resetting the objects (arrays to -1 etc). And at the beginning of the next batch, I am reusing those objects by re-initializing them.Also, for multi-thread case, these pools are made to be ThreadLocals to avoid the synchronization overhead.

Shiladitya
  • 133
  • 2
  • 11