4

I have a Java program that operates on a (large) graph. Thus, it uses a significant amount of heap space (~50GB, which is about 25% of the physical memory on the host machine). At one point, the program (repeatedly) picks one node from the graph and does some computation with it. For some nodes, this computation takes much longer than anticipated (30-60 minutes, instead of an expected few seconds). In order to profile these opertations to find out what takes so much time, I have created a test program that creates only a very small part of the large graph and then runs the same operation on one of the nodes that took very long to compute in the original program. Thus, the test program obviously only uses very little heap space, compared to the original program.

It turns out that an operation that took 48 minutes in the original program can be done in 9 seconds in the test program. This really confuses me. The first thought might be that the larger program spends a lot of time on garbage collection. So I turned on the verbose mode of the VM's garbage collector. According to that, no full garbage collections are performed during the 48 minutes, and only about 20 collections in the young generation, which each take less than 1 second.

So my questions is what else could there be that explains such a huge difference in timing? I don't know much about how Java internally organizes the heap. Is there something that takes significantly longer for a large heap with a large number of live objects? Could it be that object allocation takes much longer in such a setting, because it takes longer to find an adequate place in the heap? Or does the VM do any internal reorganization of the heap that might take a lot of time (besides garbage collection, obviously).

I am using Oracle JDK 1.7, if that's of any importance.

trincot
  • 317,000
  • 35
  • 244
  • 286
Georg
  • 283
  • 2
  • 12
  • 3
    Without knowing what sort of operations your program does, this is impossible to answer. – Dawood ibn Kareem Feb 10 '14 at 07:59
  • How else do the test and the main program differ than the amount of allocatable heap? Do they operate on other kind of data? How much heap does the test application use, there is a performance option for using short pointers if only little heap is used (but that can in no way explain your performance difference) – ooxi Feb 10 '14 at 08:02
  • You should use a good profiler (YourKit, for instance) to analyze the reason for the slowness, I find it hard to believe that anyone here can guess what's the sources of the issue. – Nir Alfasi Feb 10 '14 at 08:10
  • I am usinge a profiler on the test program. Profiling the actual program is not a good option as it would take too long. (Creating the big data structure takes days, even without a profiler.) I was mainly hoping to get some insights into whether the VM might do anything other than garbage collecting that might steal time. Based on the answers so far, that does not seem to be the case. – Georg Feb 10 '14 at 09:25
  • Meanwhile I have identified the source of the problem. It had nothing to do with the heap size. This confirms the accepted answer below. – Georg Feb 19 '14 at 05:57

2 Answers2

3

While bigger memory might mean bigger problems, I'd say there's nothing (except the GC which you've excluded) what could extend 9 seconds to 48 minutes (factor 320).

A big heap makes seemingly worse spatial locality possible, but I don't think it matters. I disagree with Tim's answer w.r.t. "having to leave the cache for everything".

There's also the TLB which a cache for the virtual address translation, which could cause some problems with very large memory. But again, not factor 320.

I don't think there's anything in the JVM which could cause such problems.

The only reason I can imagine is that you have some swap space which gets used - despite the fact that you have enough physical memory. Even slight swapping can be the cause for a huge slowdown. Make sure it's off (and possibly check swappiness).

maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • As far as I can tell no swapping is taking place. Based on your answer, I can only infer that the problem lies somewhere in my code. I will have to investigate further. Thanks! – Georg Feb 10 '14 at 09:26
  • Did you find the problem ? – Paul Praet Mar 04 '21 at 09:25
0

Even when things are in memory you have multiple levels of caching of data on modern CPUs. Every time you leave the cache to fetch data the slower that will go. Having 50GB of ram could well mean that it is having to leave the cache for everything.

The symptoms and differences you describe are just massive though and I don't see something as simple as cache coherency making that much difference.

The best advice I can five you is to try running a profiler against it both when it's running slow and when it's running fast and compare the difference.

You need solid numbers and timings. "In this environment doing X took Y time". From that you can start narrowing things down.

Tim B
  • 40,716
  • 16
  • 83
  • 128