2

Situation:

  • Need a cache of an expensive-to-create and non-thread-safe external resource
  • The resource requires explicit clean up
  • The termination of each thread cannot be hooked, but that of the application can
  • The code also runs in a Servlet container, so caches that cause a strong reference from the system class loader (e.g. ThreadLocal) cannot be directly used (see edit below).

Thus to use a ThreadLocal, it can only hold WeakReferences to the resource and a separated collection of strong references has to be kept. The code quickly gets very complicated and creates a memory leak (as the strong reference is never removed after thread death).

ConcurrentHashMap seems to be a good suit, but it also suffers from the memory leak.

What other alternatives are there? A synchronised WeakHashMap??

(Hopefully the solution can also be automatically initialised using a given Supplier just like ThreadLocal.withInitial())


Edit:

Just to prove the class loader leak is a thing. Create a minimal WAR project with:

public class Test {
    public static ThreadLocal<Test> test = ThreadLocal.withInitial(Test::new);
}

index.jsp:

<%= Test.test.get() %>

Visit the page and shutdown the Tomcat and you get:

Aug 21, 2015 5:56:11 PM org.apache.catalina.loader.WebappClassLoaderBase checkThreadLocalMapForLeaks
SEVERE: The web application [test] created a ThreadLocal with key of type [java.lang.ThreadLocal.SuppliedThreadLocal] (value [java.lang.ThreadLocal$SuppliedThreadLocal@54e69987]) and a value of type [test.Test] (value [test.Test@2a98020a]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
billc.cn
  • 7,187
  • 3
  • 39
  • 79
  • Are the resources bound to threads or can they be used from different threads, just only one at a time? –  Aug 21 '15 at 13:51
  • They do no have to be bound to a particular thread :) – billc.cn Aug 21 '15 at 14:09
  • Who is holding the strong reference to the `ThreadLocal`? – Holger Aug 21 '15 at 15:38
  • The `ThreadLocal` is just in a field in a class in my WAR. A separated `List` is used to hold (strong references to) all the resource instances refered to by the `WeakReference`s (in the `ThreadLocal`). This is to prevent `ThreadLocalMap`, which is referred to by each `Thread`, from having a strong reference chain to the WAR's class loader (thus causing a leak). – billc.cn Aug 21 '15 at 15:53
  • Threads are *not* holding strong references to `ThreadLocal` instances. The internal `ThreadLocalMap` uses weak references. You are actually causing the problem you are trying to fix. – Holger Aug 21 '15 at 15:59
  • https://stackoverflow.com/questions/17968803/threadlocal-memory-leak – billc.cn Aug 21 '15 at 16:03
  • Well, look at the accepted answer of that question: *By definition, a reference to a ThreadLocal value is kept until the "owning" thread dies or if the ThreadLocal itself is no longer reachable*. So if the ThreadLocal itself is unreachable, it can be collected. There is no contradiction. If the ThreadLocal isn’t collected, it implies that someone (*other than the Thread class*) holds a strong reference to it. That’s also in line with the second upvoted answer to that question: *However, the references from the worker threads to the ThreadLocal are WeakReferences!* – Holger Aug 21 '15 at 16:10
  • Well, maybe I failed to make it obvious, the `ThreadLocal`, being used as a cache, is in a long-lived class that only really dies with the WAR's class loader. Using strong reference in it will prevent the WAR's class loader from being GC'd in the first place. Also, I'll need a direct reference to all the instances so I can clean them up during shut down as there's no ability to make each worker thread do the clean up. – billc.cn Aug 21 '15 at 16:21
  • I can’t follow you. If the class holding the `ThreadLocal` dies with the WAR's class loader, then there is no problem that the WAR's class loader will not die before the class holding the `ThreadLocal`. You are basically saying that they both have the same lifetime. Nevertheless, if they have different lifetimes, the problem you are describing is unsolvable. You want to hold strong references that don’t prevent garbage collection… – Holger Aug 21 '15 at 16:37
  • Really sorry to have confused you. I've updated the question to highlight the case I am thinking about. – billc.cn Aug 21 '15 at 17:00

4 Answers4

1

That seems to be the typical “weak key, strong value referencing the key” problem. If you make the value weak, it can be collected even if the key is reachable, if you make it strong, the key is strongly reachable as well. This can’t be solved without a direct support by the JVM.

Thankfully there is a class which offers that (though it’s not emphasized in its documentation):

java.lang.ClassValue:

Lazily associate a computed value with (potentially) every type. For example, if a dynamic language needs to construct a message dispatch table for each class encountered at a message send call site, it can use a ClassValue to cache information needed to perform the message send quickly, for each class encountered.

While this documentation doesn’t say that the values may refer to the Class key, it’s intended use case of holding dispatch tables for a class implies that it is typical to have values with back-references.

Let’s demonstrate it with a small test class:

public class ClassValueTest extends ClassValue<Method> {
    @Override
    protected Method computeValue(Class<?> type) {
        System.out.println("computeValue");
        return Arrays.stream(type.getDeclaredMethods())
            .filter(m->Modifier.isPublic(m.getModifiers()))
            .findFirst().orElse(null);
    }
    public static void main(String... arg) throws Throwable {
        // create a collectible class:
        MethodHandles.Lookup l=MethodHandles.lookup();
        MethodType noArg = MethodType.methodType(void.class);
        MethodHandle println = l.findVirtual(
            PrintStream.class, "println", MethodType.methodType(void.class, String.class));
        Runnable r=(Runnable)LambdaMetafactory.metafactory(l, "run",
            println.type().changeReturnType(Runnable.class), noArg, println, noArg)
           .getTarget().invokeExact(System.out, "hello world");
        r.run();
        WeakReference<Class<?>> ref=new WeakReference<>(r.getClass());
        ClassValueTest test=new ClassValueTest();
        // compute and get
        System.out.println(test.get(r.getClass()));
        // verify that the value is cached, should not compute
        System.out.println(test.get(r.getClass()));
        // allow freeing
        r=null;
        System.gc();
        if(ref.get()==null) System.out.println("collected");
        // ensure that it is not our cache instance that has been collected
        System.out.println(test.get(String.class));
    }
}

On my machine it printed:

hello world
computeValue
public void ClassValueTest$$Lambda$1/789451787.run()
public void ClassValueTest$$Lambda$1/789451787.run()
collected
computeValue
public boolean java.lang.String.equals(java.lang.Object)

To explain, this test creates an anonymous class, just like lambda expressions produce, which can be garbage collected. Then it uses the ClassValueTest instance to cache a Method object of that Class. Since Method instances have a reference to their declaring class, we have the situation of a value referring to its key here.

Still, after the class is not used anymore, it gets collected, which implies that the associated value has been collected too. So its immune to backreferences of the value to the key.

The last test using another class just ensures that we are not a victim of eager garbage collection as described here as we are still using the cache instance itself.


This class associates a single value with a class, not a value per thread, but it should be possible to combine ClassValue with ThreadLocal to get the desired result.

Community
  • 1
  • 1
Holger
  • 285,553
  • 42
  • 434
  • 765
0

I am not sure about the problem you are talking about. Please take a look at: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem

Some Questions:

  • How is the resource referenced?
  • What is the interface to the resource?
  • What data should be cached at all?
  • What is a "non-thread safe resource"
  • How often is the resource retrieved?
  • How long is the access to one resource, what level of concurrency is there?
  • Is one thread using the resource many times and this is the reason for the intended caching?
  • Are many threads using the same resource (instance)?
  • Can there be many instances of the same resource type, since the actual instance is not thread safe?
  • How many resources you have?
  • Is it many resource instances of the same type or different types?

Maybe you can try to remove the words ThreadLocal, WeakReference, ConcurrentHashMap from your question?

Some (wild) guess:

From what I can read between the lines, it seems to me that it is a straight forward use case for a Java cache. E.g. you can use Google Guava cache and add a removal listener for the explicit cleanup.

Since the resource is not thread safe you need to implement a locking mechanism. This can be done by putting a lock object into the cached object.

If you need more concurrency, create more resources of the same type and augment the cache key with the hash of the thread modulo the level of concurrency you like to have.

Community
  • 1
  • 1
cruftex
  • 5,545
  • 2
  • 20
  • 36
0

I'd propose to get rid of ThreadLocal and WeakReference stuff altogether, because, as you say, resources are not bound to specific threads, they just cannot be accessed from several threads simultaneously.

Instead, have a global cache, Map <Key, Collection <Resource>>. Cache contains only resources that are free for use at the moment.

Threads would first request an available resource from the cache. If present (this, of course, should be synchronized, as the cache is global), arbitrary resource is removed from the collection for that key and given to the thread. Otherwise, a new one for that key is built and also given to the thread.

When a thread finishes using a resource, it should return it to the cache, i.e. add to the collection mapped to resource key. From there it can be used by the same thread again, or even by a different thread.

Advantages:

  • Cache is global, trivial to shut down all allocated resources when application quits.

  • Hardly any potential for memory leaks, code should be pretty concise.

  • Threads can share resources (provided they need the same resource at different time), potentially decreasing demand.

Disadvantages:

  • Requires synchronization (but likely cheap and not difficult to code).

  • Maybe some others, depending on what exactly you do.

0

While researching the weak concurrent map idea, I found that it's implemented in Guava's Cache.

I used the current thread as the weak key and an CacheLoader is supplied to automatically create the resource for each new thread.

A removal listener is also added, so that each thread's resource will be automatically cleaned up after the Thread object is GC'ed or when I call the invalidateAll() method during shut-down.

Most of the configuration above can also be done in a one liner (with lambdas).

billc.cn
  • 7,187
  • 3
  • 39
  • 79