5

I am trying to understand how Threadlocal can cause Classloader leak. So for this I have this code

public class Main {
    public static void main(String... args) throws Exception {
        loadClass();
        while (true) {
            System.gc();
            Thread.sleep(1000);
        }
    }

    private static void loadClass() throws Exception {
        URL url = Main.class.getProtectionDomain()
                .getCodeSource()
                .getLocation();
        MyCustomClassLoader cl = new MyCustomClassLoader(url);
        Class<?> clazz = cl.loadClass("com.test.Foo");
        clazz.newInstance();
        cl = null;
    }
}

class MyCustomClassLoader extends URLClassLoader {
    public MyCustomClassLoader(URL... urls) {
        super(urls, null);
    }

    @Override
    protected void finalize() {
        System.out.println("*** CustomClassLoader finalized!");
    }
}

Foo.java

public class Foo {
    private static final ThreadLocal<Bar> tl = new ThreadLocal<Bar>();

    public Foo() {
        Bar bar = new Bar();
        tl.set(bar);
        System.out.println("Test ClassLoader: " + this.getClass()
                .getClassLoader());
    }

    @Override
    protected void finalize() {
        System.out.println(this + " finalized!");
    }
}

Bar.java

public class Bar {
    public Bar() {
        System.out.println(this + " created");
        System.out.println("Bar ClassLoader: " + this.getClass()
                .getClassLoader());
    }

    @Override
    public void finalize() {
        System.out.println(this + " finalized");
    }
}

After running this code it shows that MyCustomClassloader and Bar finalize is not called, only Foo finalize is called. But when I change Threadlocal to String then all the finalize is called.

public class Foo {
    private static final ThreadLocal<String> tl = new ThreadLocal<String>();

    public Foo() {
        Bar bar = new Bar();
        tl.set("some");
        System.out.println("Test ClassLoader: " + this.getClass()
                .getClassLoader());
    }

Can you please explain why there is a difference when using ThreadLocal as String vs Bar?

Holger
  • 285,553
  • 42
  • 434
  • 765
Pakira
  • 1,951
  • 3
  • 25
  • 54
  • Why would it be finalized if you store it in the ThreadLocal? You've stored it somewhere that is still reachable (because `Foo` is still loaded). If you store a string in the ThreadLocal, you're not storing the Bar, so it is unreachable and can be gc'd. – Andy Turner Mar 03 '21 at 08:16
  • 1
    It’s not as simple as it looks on the first glance. – Holger Mar 03 '21 at 13:35

1 Answers1

3

When you set the thread local variable to an instance of Bar, the value has an implicit reference to its defining class loader, which is also the defining class loader of Foo and hence, has an implicit reference to its static variable tl holding the ThreadLocal.

In contrast, the String class is defined by the bootstrap loader and has no implicit reference to the the Foo class.

Now, a reference cycle is not preventing garbage collection per se. If only one object holds a reference to a member of the cycle and that object becomes unreachable, the entire cycle would become unreachable. The problem here is that the object still referencing the cycle is the Thread that is still alive.

The specific value is associated with the combination of a ThreadLocal instance and a Thread instance and we’d wish that if either of them becomes unreachable, it would stop referencing the value. Unfortunately, no such feature exists. We can only associate a value with the reachability of one object, like with the key of a WeakHashMap, but not of two.

In the OpenJDK implementation, the Thread is the owner of this construct, which makes it immune against values back-referencing the Thread. E.g.

ThreadLocal<Thread> local = new ThreadLocal<>();

ReferenceQueue<Thread> q = new ReferenceQueue<>();

Set<Reference<?>> refs = ConcurrentHashMap.newKeySet();

new Thread(() -> {
    Thread t = Thread.currentThread();
    local.set(t);
    refs.add(new WeakReference<>(t, q));
}).start();

Reference<?> r;
while((r = q.remove(2000)) == null) {
    System.gc();
}

if(refs.remove(r)) System.out.println("Collected");
else System.out.println("Something very suspicuous is going on");

This will print Collected, indicating that the reference from the value to the Thread did not prevent the removal, unlike put(t, t) on a WeakHashMap.

The price is that this construct is not immune against backreferences to the ThreadLocal instance.

ReferenceQueue<Object> q = new ReferenceQueue<>();

Set<Reference<?>> refs = ConcurrentHashMap.newKeySet();

createThreadLocal(refs, q);

Reference<?> r;
while((r = q.remove(2000)) == null) {
    System.gc();
}

if(refs.remove(r)) System.out.println("Collected");
else System.out.println("Something very suspicuous is going on");
static void createThreadLocal(Set<Reference<?>> refs, ReferenceQueue<Object> q) {
    ThreadLocal<ThreadLocal<?>> local = new ThreadLocal<>();
    local.set(local);
    refs.add(new WeakReference<>(local, q));
}

This will hang forever, as the backreference from the ThreadLocal to itself prevents its garbage collection, as long as the associated thread is still alive.

Your case is just a special variant of it, as the backreference is through the Bar instance, its defining loader, to Foo’s static variable. But the principle is the same.

You only need to change the line

loadClass();

to

new Thread(new FutureTask(() -> { loadClass(); return null; })).start();

to stop the value from being associated with the main thread. Then, the class loader and all associated classes and instances get garbage collected.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • Thanks for the details, could you please explain "to stop the value from being associated with the main thread", is this because New Thread does not have a reference on Main thread? – Pakira Mar 03 '21 at 14:51
  • 1
    You are setting a value on a *thread local* variable. The value will be associated with the thread executing the `tl.set(bar);` statement. When the main thread executes the statement, the value is associated with the main thread which is still alive afterwards (as it is the thread calling `System.gc()` in a loop). When a different thread executes the statement, the value will be associated with that particular thread and can be removed when that thread terminates. – Holger Mar 03 '21 at 15:07