I'm cleaning up some code that's started throwing java.lang.OutOfMemoryError
in Production.
The problematic area has a couple of methods that process large collections, e.g.:
public void doSomething(Collection<HeavyObject> inputs) {
... do some stuff using INPUTS, deriving some different objects ...
... do some other stuff NOT using INPUTS, only derived objects ...
}
public void unsuspectingCaller() {
Collection<HeavyObject> largeCollection;
... some stuff to populate the collection ...
doSomething(largeCollection);
... other stuff ...
// this following code may be added in the future
kaboom(largeCollection); // walks into maintenance trap!
}
The code is blowing up and running out of memory in ... do some other stuff NOT using INPUTS ...
I can fix reduce the memory consumption (allowing early GC) by adding a inputs.clear()
in between the two blocks.
But, I do not want to set a trap for future maintainers who might not be aware that the input collection is cleared. In fact, the inputs
would ideally have been immutable, to more clearly communicate the intent of the code.
Is there an idiomatic way to declare doSomething()
to make it clear, or even compiler verifiable, that the caller of doSomething()
is not supposed to continue using the collection after doSomething()
has been called?
UPDATE
For additional clarity, I renamed the parameter to inputs
, instead of targets
. Just keep that in mind when reviewing the comments.
UPDATE2
Addressing the suggestion from @Stephen C, we can see clearly that the JVM does not release references held by the caller, even if they are just passed in as an unnamed parameter. Execute with -Xmx8g
(fail) and -Xmx9g
(pass):
package com.stackoverflow.sandbox;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import org.junit.jupiter.api.Test;
public class MemoryTest {
static class HeavyObject {
int[] oneGigabyte = IntStream.range(0, 256_000_000).toArray();
public int[] getGig() {
return oneGigabyte;
}
}
private int[] skynet(int[] in) {
// perform out-of-this-world artificial intelligence computation
return Arrays.stream(in).map(x -> x >> 1).toArray();
}
void doSomething(Collection<HeavyObject> input) {
Collection<int[]> doubleMemoryUsage = input.stream().map(HeavyObject::getGig).map(this::skynet).collect(Collectors.toList());
input = null;
Collection<int[]> tripleMemoryUsage = doubleMemoryUsage.stream().map(this::skynet).collect(Collectors.toList());
double sum = tripleMemoryUsage.stream().flatMapToDouble(array -> Arrays.stream(array).asDoubleStream()).sum();
System.out.println("sum = " + sum);
}
@Test
void caller1() {
doSomething(List.of(new HeavyObject(), new HeavyObject(), new HeavyObject()));
System.out.println("done1");
}
@Test
void caller2() {
Collection<HeavyObject> threeGigs = List.of(new HeavyObject(), new HeavyObject(), new HeavyObject());
doSomething(threeGigs);
System.out.println("done2");
}
}
Another way to state the challenge, is how to reduce the memory usage in doSomething() from triple to double in an idiomatic way that enforces safe usage at compile time?