1

I'm cleaning up some code that's started throwing java.lang.OutOfMemoryError in Production.

The problematic area has a couple of methods that process large collections, e.g.:

public void doSomething(Collection<HeavyObject> inputs) {

  ... do some stuff using INPUTS, deriving some different objects ...

  ... do some other stuff NOT using INPUTS, only derived objects ...

}

public void unsuspectingCaller() {
  Collection<HeavyObject> largeCollection;
  ... some stuff to populate the collection ...
  doSomething(largeCollection);
  ... other stuff ...
   // this following code may be added in the future
  kaboom(largeCollection); // walks into maintenance trap!
}

The code is blowing up and running out of memory in ... do some other stuff NOT using INPUTS ...

I can fix reduce the memory consumption (allowing early GC) by adding a inputs.clear() in between the two blocks.

But, I do not want to set a trap for future maintainers who might not be aware that the input collection is cleared. In fact, the inputs would ideally have been immutable, to more clearly communicate the intent of the code.

Is there an idiomatic way to declare doSomething() to make it clear, or even compiler verifiable, that the caller of doSomething() is not supposed to continue using the collection after doSomething() has been called?

UPDATE

For additional clarity, I renamed the parameter to inputs, instead of targets. Just keep that in mind when reviewing the comments.

UPDATE2

Addressing the suggestion from @Stephen C, we can see clearly that the JVM does not release references held by the caller, even if they are just passed in as an unnamed parameter. Execute with -Xmx8g (fail) and -Xmx9g (pass):

package com.stackoverflow.sandbox;

import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

import org.junit.jupiter.api.Test;

public class MemoryTest {

    static class HeavyObject {
        int[] oneGigabyte = IntStream.range(0, 256_000_000).toArray();
        public int[] getGig() {
            return oneGigabyte;
        }
    }
    
    private int[] skynet(int[] in) {
        // perform out-of-this-world artificial intelligence computation
        return Arrays.stream(in).map(x -> x >> 1).toArray();
    }
    
    void doSomething(Collection<HeavyObject> input) {
        Collection<int[]> doubleMemoryUsage = input.stream().map(HeavyObject::getGig).map(this::skynet).collect(Collectors.toList());

        input = null;
        
        Collection<int[]> tripleMemoryUsage = doubleMemoryUsage.stream().map(this::skynet).collect(Collectors.toList());
        
        double sum = tripleMemoryUsage.stream().flatMapToDouble(array -> Arrays.stream(array).asDoubleStream()).sum();
        System.out.println("sum = " + sum);
    }
    
    @Test
    void caller1() {
        doSomething(List.of(new HeavyObject(), new HeavyObject(), new HeavyObject()));
        System.out.println("done1");
    }
    
    @Test
    void caller2() {
        Collection<HeavyObject> threeGigs = List.of(new HeavyObject(), new HeavyObject(), new HeavyObject());
        doSomething(threeGigs);
        System.out.println("done2");
    }

}

Another way to state the challenge, is how to reduce the memory usage in doSomething() from triple to double in an idiomatic way that enforces safe usage at compile time?

Alex R
  • 11,364
  • 15
  • 100
  • 180
  • 1
    If you're worried about it, wouldn't simply clearing the collection to "0" items work (e.g. your `targets.clear()`) work? – paulsm4 Apr 03 '21 at 00:52
  • It "works" but triggers the maintenance trap and causes the unsuspecting callers to break, either now or in the future – Alex R Apr 03 '21 at 00:53
  • 1
    How about allocating more memory? – Bohemian Apr 03 '21 at 00:56
  • [Java - dispose pattern](https://stackoverflow.com/q/47902160) – Robert Harvey Apr 03 '21 at 00:58
  • [Try with Resources](https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html) – Robert Harvey Apr 03 '21 at 00:58
  • 1
    Clearly, the "ideal" is to initialize your "collection" as soon as it's in scope, and structure your program so that it's OUT of scope (hence eligible for the JVM to free) as soon as it's no longer needed. But that would involve restructuring your entire app - probably not an option. C++ always had [RAII](https://en.cppreference.com/w/cpp/language/raii). [try-with-resources](https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html) is a reasonably similar idiom in Java. – paulsm4 Apr 03 '21 at 00:59
  • @Bohemian very funny – Alex R Apr 03 '21 at 00:59
  • @RobertHarvey that looks very promising – Alex R Apr 03 '21 at 01:00
  • @paulsm4 right, refactoring the caller to fix a problem in the callee is not awesome – Alex R Apr 03 '21 at 01:01
  • But I completely disagree with your premise that "clear()" is necessarily a "maintenance nightmare". As long as you're calling a hack like " doSomething()" anyway, It's simple and effective. – paulsm4 Apr 03 '21 at 01:02
  • rather weird gc problem. `scope` and `reachability` are different concepts and JVMs GCs don't work as scope-based clean-up. What I am mean by that is that when `do some other stuff NOT using TARGETS ...` is reached and `targets` is not referenced by any thread, it is eligible (at that point) for collection; it does not need to wait for the method scope to end. And there are example where JVM has been seen live doing that, [for example this one](https://stackoverflow.com/questions/58714980/rejectedexecutionexception-inside-single-executor-service) – Eugene Apr 03 '21 at 01:28
  • also, are you saying that calling `clear` actually helps? on a reproducible basis? that was not very clear from the question – Eugene Apr 03 '21 at 01:34
  • Is the element type an inner class? – Bohemian Apr 03 '21 at 02:31
  • @Eugene the clear actually did help. Verified using repeatable test runs with different heap settings. – Alex R Apr 03 '21 at 17:45
  • rather weird, I remember that old GC implementations had this issue... what VM and GC are you using? – Eugene Apr 03 '21 at 17:52
  • @Eugene I'm not sure what part you think is weird. The only reference to the input objects is in the input Collection. So clearing the Collection makes the objects themselves GC-eligible. – Alex R Apr 03 '21 at 17:55
  • but if the input is not referenced by any other thread after the method call, it is eligible for GC with everything that it holds. As one of the answers said, if you do not care about that input anymore, you can simply set it to `null` – Eugene Apr 03 '21 at 18:04
  • It's not referenced _after_ the method call, but it's still referenced in the caller _during_ the call. – Alex R Apr 03 '21 at 18:07

3 Answers3

1

The problem with calling targets.clear() is (as you noted) that something else may be using the collection. So here's how I would approach this:

public void doSomething(Collection<Widget> targets) {
  // ... do some stuff using TARGETS ...

  targets = null;

  // ... do some other stuff NOT using TARGETS ...
}

The targets = null; will prevent this call from retaining a reference to the collection longer than it needs to. The performance impact is almost zero, and the damage (if any!) of nulling prematurely is localized to the doSomething() method itself.

The problem then falls on the caller:

// This should be OK
doSomething(computeWidgets(...));

// This may be a problem
Collection<Widget> targets = computeWidgets(...);
doSomething(targets);
// Don't use 'targets' from now on.

In the first example, the JVM should be able to tell that the caller has no reachable reference once the call has started. In the second example, it is more difficult for the JVM to know. But in both cases, you are dependent on the JVM to detect that the reference is effectively unreachable in the caller.

UPDATE

I suspect that the reason that your MemoryTest example fails is that your doSomething code is creating a temporary variable or using a register or something to hold a reference to the Stream object. The JVM may not realize that that variable / register is no longer live, and may therefore be treating the Stream object as reachable. But the Stream object will most likely have a reference to the original collection, and that will make the collection reachable as well.

It could be argued that that would be a JVM bug, but I don't think so. The JLS and JVMS make no strong statements about whether / when the JVM should detect that an local variable (or temporary variable / register) used a method call is no longer reachable.


But I really think that Bohemian has given you the best answer. (No. I don't think he was being funny.)

If you are having to micro-optimize this to squeeze your problem into memory footprint of your current (small) heap, then the simple solution is to make the heap bigger.

As you noted the various clever things that you could do to optimize storage usage (e.g. by clearing things) could actually break the application or make it harder to maintain.

(And your MemoryTest example is a good illustration of how clever optimizations may fail. There is stuff happening behind the scenes that is difficult to predict.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • hi @Stephen. Setting it to `null` within the method does nothing to help GC, since the reference to the Collection still alive in the caller. – Alex R Apr 03 '21 at 17:57
  • Actually, upon re-reading, I'm now wondering whether the `This should be OK` code actually behaves differently. I'm going to have to test this. – Alex R Apr 03 '21 at 18:05
  • @AlexR I am not entirely sure what example is suppose to prove. If I isolate that `caller1` in a main method, like : `public static void main(String[] args) { new MemoryTest().doSomething(Arrays.asList(new HeavyObject(), new HeavyObject(), new HeavyObject()));}`; and do `java -Xms6g -Xmx6g MemoryTest.java` it runs just fine. If I comment that `inputs = null`, it will fail. exactly as this answer says. – Eugene Apr 04 '21 at 17:52
  • @Eugene would you mind to post your actual test code? I'm not sure I follow. – Alex R Apr 04 '21 at 18:08
1

I am only posting this since it's too big for a comment:

import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

public class MemoryTest {

    public static void main(String[] args) {
        new MemoryTest().doSomething(List.of(new HeavyObject(), new HeavyObject(), new HeavyObject()));
    }


    static class HeavyObject {
        int[] oneGigabyte = IntStream.range(0, 256_000_000).toArray();
        public int[] getGig() {
            return oneGigabyte;
        }
    }

    private int[] skynet(int[] in) {
        // perform out-of-this-world artificial intelligence computation
        return Arrays.stream(in)
                .map(x -> x >> 1)
                .toArray();
    }


    void doSomething(Collection<HeavyObject> input) {
        Collection<int[]> doubleMemoryUsage = input.stream().map(HeavyObject::getGig).map(this::skynet).collect(Collectors
                .toList());

        input = null;

        Collection<int[]> tripleMemoryUsage = doubleMemoryUsage.stream().map(this::skynet).collect(Collectors.toList());

        double sum = tripleMemoryUsage.stream().flatMapToDouble(array -> Arrays.stream(array).asDoubleStream()).sum();
        System.out.println("sum = " + sum);
    }

}

And run this with java -Xms6g -Xmx6g MemoryTest.java. It will work.

Now comment that input = null; and run it: it will fail.

Running this on jdk-15 btw.


In theory even if you change the method to:

void doSomething() {
    Collection<HeavyObject> input = List.of(new HeavyObject(), new HeavyObject(), new HeavyObject());
    Collection<int[]> doubleMemoryUsage = input.stream().map(HeavyObject::getGig).map(this::skynet).collect(Collectors
            .toList());

    //input = null;

    Collection<int[]> tripleMemoryUsage = doubleMemoryUsage.stream().map(this::skynet).collect(Collectors.toList());

    double sum = tripleMemoryUsage.stream().flatMapToDouble(array -> Arrays.stream(array).asDoubleStream()).sum();
    System.out.println("sum = " + sum);
}

it should not fail either, but it does. Even with ZGC or Shenandoah. I don't have the knowledge to explain why, though.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • Ok, this I didn't expect: You are right `java -Xms6g -Xmx6g MemoryTest.java` works with jdk-15 from Oracle launched from the command-line. However, launching from within Eclipse -> Run As Java Program with the same -Xms6g -Xmx6g parameters, pointed at the same jdk-15, fails with OOM! – Alex R Apr 15 '21 at 01:18
  • The only difference that I can see is that Eclipse runs `javaw` instead of `java` – Alex R Apr 15 '21 at 01:20
  • @AlexR I haven't used Eclipse for years and can't test that, but the command line is the truth for me. always. Setting the arguments to `null` to help GC is still used in JDK code itself, look into `LinkedList` and `LinkedBlockingQueue`. – Eugene Apr 15 '21 at 01:32
0

I was hoping for something better but here's what I came up with:

Refactor doSomething() into a class.

class DoerOfSomething {
   public DoerOfSomething(Collection<HeavyObject> inputs) {
     ... do some stuff using INPUTS, deriving other objects ...
     // derived objects are set as members
     // inputs goes out of scope
   } 
   public void doSomething() {
     ... do some other stuff NOT using INPUTS, only derived objects ...
   }
}

Now the caller can perform their own analysis to see if targets.clear() is ok to call.

Alex R
  • 11,364
  • 15
  • 100
  • 180