1

In the following code:-

Set<Object> allObjects = new HashSet<>(1000);
while(...){ //really big loop - millions
  Object s = getSomeObject();//returns a big object
  allObjects.add(s);

  if(allObject.size() == 1000){
    //If allObjects has got big enough, process and proceed further
    final Set<Object> allObjects_temp = new HashSet<>(allObjects); //--->a
    allObject.clear();

    //Process these objects in a separate tasks parallely
    executerService.submit(new Runnable() {
        public void run() {
            processData(partner, allObjects_temp);//---->b
        }
        }));
  }
}

Line-a will create new set with the elements copied from another set.

In line-b, these elements are being processed in another thread. However anonymous class is accessing the reference in outer class(Object s) through allObjects_temp, I believe Java GC will not be able to collect Objects created in the loop, even after the task has finished.

Will the Java GC in fact collect the Objects or not? If it will, please provide a brief explanation as to why it will. If it will be unable to collect the Objects, how can we fix this?

reference: When exactly is it leak safe to use (anonymous) inner classes?

Update: Wrote the following code to verify the memory leak. If no memory the code should run forever.

public class MemoryLeak {

    public static void main(String[] args) {
        ExecutorService executerService = new ThreadPoolExecutor(10, 10, 50, TimeUnit.MILLISECONDS,
            new ArrayBlockingQueue<Runnable>(50, true), new ThreadPoolExecutor.CallerRunsPolicy());
        Set<Object> allObjects = new HashSet<>(1500);
        for (int i = 0; i < 100000;) { // really big loop - millions
            Object s = getSomeObject();// returns a big object
            allObjects.add(s);

            if (allObjects.size() == 1500) {
            // If allObjects has got big enough, process and proceed further
            final Set<Object> allObjects_temp = new HashSet<>(allObjects); // --->a
            allObjects.clear();

            // Process these objects in a separate tasks parallely
            executerService.submit(new Runnable() {
                public void run() {
                processData(allObjects_temp);// ---->b
                }
            });
            }
        }
        executerService.shutdown();
    }

    private static Object getSomeObject() {
        return "hello" + (Math.random() * 100000) + "" + (Math.random() * 100000);

    }

    protected static void processData(Set<Object> allObjects_temp) {
        try {
            Runtime rt = Runtime.getRuntime();
            System.err.println(String.format("Free: %d Mb, Total: %d Mb, Max: %d Mb", rt.freeMemory() / 1000000,
                rt.totalMemory() / 1000000, rt.maxMemory() / 1000000));
            Thread.sleep(10);
            System.out.println("done");
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }
}

Result: The code is running forever. Also I checked that the memory is getting freed regularly and Jmap showed me that Object s is indeed being garbage collected. I still don't completely understand how?

Richard Chambers
  • 16,643
  • 4
  • 81
  • 106
Mangat Rai Modi
  • 5,397
  • 8
  • 45
  • 75
  • 2
    Java GC will manage just fine. If last reference disappears, it will eventually collect the objects. Problem in your code above it more that you can run out of memory in case you schedule executions a lot faster than you can run them, so executor will keep references to runnables which in turn hold references to batches of 1000 of big objects. – Artur Biesiadowski Sep 07 '17 at 13:10
  • 1
    I agree with Artur. Also, by the way, `new HashSet<>(1000);` does not give the hashset a *capacity* of 1000. You typically need to pass at least 1500 to the constructor to give it initial capacity for 1000 elements. Read the Javadoc for more details. Finally, you can avoid unnecessary copying by doing `allObjects_temp = allObjects; allObjects = new HashSet<>();` – Klitos Kyriacou Sep 07 '17 at 13:18
  • @Klitos, I had no idea about it thanks – Mangat Rai Modi Sep 07 '17 at 13:20
  • @ArturBiesiadowski Could you please explain a bit more. Which reference are you talking about? If I keep everything in memory until my while loop runs It will go OOM. My executer service is created under ThreadPoolExecutor.CallerRunsPolicy so rate submission will be eventually controlled. – Mangat Rai Modi Sep 07 '17 at 13:27
  • This is just a tip, but that whole Runnable can be replaced with `executerService.submit(() -> processData(partner, allObjects_temp));` – Novaterata Sep 07 '17 at 13:32
  • not using Java 8 yet sadly :( – Mangat Rai Modi Sep 07 '17 at 13:35
  • 1
    why a close vote? I believe it was a genuine programming question with lots of research being done. – Mangat Rai Modi Sep 07 '17 at 14:27

1 Answers1

1

I think above code will not produce a memory leak.

In the hotspot JVM, the GC will check whether an object is valid by reachability analysis. And the static variable, stack variable can be regard as GC root.

In your code, the reference of object allObjects_temp variable will be stored on the JVM stack. When assigning the new object reference to the allObjects_temp, the JVM stack is not holding the reference of the old object allObjects_temp, and then when the task thread run is over, that means the task thread doesn't hold the reference of the old object. So the old object is not reachable, and the GC can collect it.

See the following code:

    Set<String>  set=new HashSet<>();
    while (true){
        String s=new String("test");
        set.add(s);
        if(set.size()==100){
            final Set<String> temp=new HashSet<>(set);   //line 1
            set.clear();;
        }
    }

And the byte code of that line 1 is :

    new           #2                  // class java/util/HashSet
    40: dup
    41: aload_1
    42: invokespecial #9                  // Method java/util/HashSet."<init>":(Ljava/util/Collection;)V
    45: astore_3                //line 2

The line2 means the final variable is stored on the jvm stack.

Cecilya
  • 519
  • 1
  • 5
  • 20
dabaicai
  • 999
  • 8
  • 9
  • but the allObjects_temp contains objects created above, will they also be garbage collected? – Mangat Rai Modi Sep 07 '17 at 13:36
  • @MangatRaiModi If the `allObjects_temp` is not reachable,it means the item inside the container is not access by `allObjects_temp`.In your code,`Object s = getSomeObject()`,the variable `s` is also store the jvm stack.So when assign something to it again,the old is not reachable.That means the `allObjects_temp` and the item of it also can be collected by gc. – dabaicai Sep 07 '17 at 13:42
  • Sorry if I am missing something, but isn't objects in JVM allocated from heap and not stack? – Mangat Rai Modi Sep 07 '17 at 13:46
  • @MangatRaiModi Yes,the object is store at the heap,but the reference (like a pointer point the object in heap) is store at stack if you create a object in method and assign above created object to a local variable. – dabaicai Sep 07 '17 at 13:50