4

I have a Java class that does something like this:

public void slowsDownOverTime() {
    for (int i=0 ; i<nIter ; i++) {
        BigObject bigObject = new BigObject();
        // some code here that populates big object ...
        CustomSerializer.write(bigObject);
    }
}

What I observe is that as the code iterates, the time needed by the serializer to write gets longer and longer. When it starts, the serializer runs in milliseconds; after a few tens of thousands of iterations, it takes several seconds to run.

The disk to which the serializer writes is nowhere near full, and the Java heap space in use is nowhere near its maximum when this happens.

To the extent possible I've reduced the number and size of objects created and destroyed during this cycle. That basically exhausts my toolkit for addressing this sort of problem!

Any suggestions for how I understand and correct the gradual performance degradation would be greatly appreciated!

quarkpt
  • 119
  • 6
  • Does the problem persist if you call `System.gc()` regularly (e.g. all `10_000` iterations)? – Turing85 Jan 17 '20 at 20:24
  • turn on verbose GC logging and see if high GC activity is correlating with the performance slowdown – bdkosher Jan 17 '20 at 20:24
  • 1
    Is it only the `CustomSerializer.write` that is slow? Are you appending to a file? – matt Jan 17 '20 at 20:30
  • 4
    The issue is probably related to what is happening inside that write method, no the objects themselfs – GotoFinal Jan 17 '20 at 20:32
  • Object creation should not be an issue. Do you think you can someohow batch over and call the CustomSerializer.write for that batch instead of calling it for every object? – Rishikesh Dhokare Jan 17 '20 at 20:38
  • 2
    I agree with what @GotoFinal said. I'll add, however, that probably the easiest way to find out what's happening is to use a profiler. A simple sampling profiler should do the trick, and there's a plethora of them to choose from in Java, for example VisualVM (which I believe still comes with the JDK). – Dolda2000 Jan 17 '20 at 20:41
  • 2
    Have you tried to profile the code? – apangin Jan 17 '20 at 21:15
  • It appears to be only the write that slows down. The rest of the contents of the loop take a few milliseconds per iteration at the beginning and the same at the end (that includes reading a netCDF file on every iteration). – quarkpt Jan 18 '20 at 21:33
  • @GotoFinal -- I see your point (nothing else in the loop is slowing down, including the netCDF read from disk). And yet I can't see how that can be the case because the serializer has no "memory" from call to call. Internally, a new object is created for each serialization, and there are no static members for any of the involved classes. – quarkpt Jan 18 '20 at 21:41
  • @quarkpt writing data might involve a lot of hidden cost, like look at this question: https://stackoverflow.com/questions/21947452/why-is-printing-b-dramatically-slower-than-printing – GotoFinal Jan 19 '20 at 00:44
  • but to know more you would need to profile this code probably – GotoFinal Jan 19 '20 at 00:44
  • Does the serializer call flush() ? – dimirsen Z Feb 07 '20 at 09:24

2 Answers2

1

I think, this is caused by the code left out (some code here that populates big object ...). Try this:

public void slowsDownOverTime() {
    BigObject bigObject = new BigObject();
    // some code here that populates big object ...
    for (int i=0 ; i<nIter ; i++) {
        CustomSerializer.write(bigObject);
    }
}

This will always write the same object and I expect this not to degrade in performance.

I think the left out code builds a growing data structure that is being referenced by bigObject. Keep in mind that, when serializing, Java traverses all dependent objects down to depth and serializes also all dependent objects. So it will write more and more data with each iteration. This can be the cause for degarding performace and use of much disk space.

Donat
  • 4,157
  • 3
  • 11
  • 26
  • Technically, what I think you meant is to put the code that populates the big object into the loop. The way you wrote it, the same object contents are written each time, whereas what my code, at the top, does is write different contents each time. – quarkpt Jan 18 '20 at 21:23
  • I looked at the code more carefully, and it is close to what this example shows, modulo my correction, and not my original grasp of it. The BigObject instance is the same on every iteration of the loop, but its contents change. I've minimized the amount of change that is necessary (i.e., a lot of the BigObject members are the same from one iteration to the next -- not just the contents but the actual objects; so the creation / destruction of objects that become BigObject members is minimized). – quarkpt Jan 18 '20 at 21:27
  • My suggestion was (for test) to put the code that populates the ```bigObject``` outside the loop. So you would write identical data in each iteration. Then compare this to your version. – Donat Jan 18 '20 at 23:09
  • Did the test in which the bigObject is populated outside of the loop that performs serialization (so still serializing nIter times, but the same object each time). In this case, there is no detectable slowdown. – quarkpt Feb 05 '20 at 22:50
1

After much fruitless mucking about with jvisualvm's profiler, I resorted to adding a ton of logging. This gave me a clue that the problem was in a Hibernate operation that was performed to populate the BigObject. I was able to fix the performance issue by evicting each object that I retrieved as soon as I retrieved it.

I'm not much of a Hibernate expert, but I think that what was happening is that even though the objects were going out of scope in my code (and thus getting garbage collected), Hibernate was keeping a copy of each object in its cache. Then when it performed the next retrieval, it compared the retrieved object to all the cached ones. As the cache population rose, this operation would take longer and longer.

quarkpt
  • 119
  • 6