28

I work on a project where we use a library that is not guaranteed thread-safe (and isn't) and single-threaded in a Java 8 streams scenario, which works as expected.

We would like to use parallel streams to get the low hanging scalability fruit.

Unfortunately this cause the library to fail - most likely because one instance interferes with variables shared with the other instance - hence we need isolation.

I was considering using a separate classloader for each instance (possibly thread local) which to my knowledge should mean that for all practical purposes that I get the isolation needed but I am unfamiliar with deliberately constructing classloaders for this purpose.

Is this the right approach? How shall I do this in order to have proper production quality?


Edit: I was asked for additional information about the situation triggering the question, in order to understand it better. The question is still about the general situation, not fixing the library.

I have full control over the object created by the library (which is https://github.com/veraPDF/) as pulled in by

<dependency>
    <groupId>org.verapdf</groupId>
    <artifactId>validation-model</artifactId>
    <version>1.1.6</version>
</dependency>

using the project maven repository for artifacts.

<repositories>
    <repository>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
        <id>vera-dev</id>
        <name>Vera development</name>
        <url>http://artifactory.openpreservation.org/artifactory/vera-dev</url>
    </repository>
</repositories>

For now it is unfeasible to harden the library.


EDIT: I was asked to show code. Our core adapter is roughly:

public class VeraPDFValidator implements Function<InputStream, byte[]> {
    private String flavorId;
    private Boolean prettyXml;

    public VeraPDFValidator(String flavorId, Boolean prettyXml) {
        this.flavorId = flavorId;
        this.prettyXml = prettyXml;
        VeraGreenfieldFoundryProvider.initialise();
    }

    @Override
    public byte[] apply(InputStream inputStream) {
        try {
            return apply0(inputStream);
        } catch (RuntimeException e) {
            throw e;
        } catch (ModelParsingException | ValidationException | JAXBException | EncryptedPdfException e) {
            throw new RuntimeException("invoking VeraPDF validation", e);
        }
    }

    private byte[] apply0(InputStream inputStream) throws ModelParsingException, ValidationException, JAXBException, EncryptedPdfException {
        PDFAFlavour flavour = PDFAFlavour.byFlavourId(flavorId);
        PDFAValidator validator = Foundries.defaultInstance().createValidator(flavour, false);
        PDFAParser loader = Foundries.defaultInstance().createParser(inputStream, flavour);
        ValidationResult result = validator.validate(loader);

        // do in-memory generation of XML byte array - as we need to pass it to Fedora we need it to fit in memory anyway.

        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        XmlSerialiser.toXml(result, baos, prettyXml, false);
        final byte[] byteArray = baos.toByteArray();
        return byteArray;
    }
}

which is a function that maps from an InputStream (providing a PDF-file) to a byte array (representing the XML report output).

(Seeing the code, I've noticed that there is a call to the initializer in the constructor, which may be the culprit here in my particular case. I'd still like a solution to the generic problem.

Thorbjørn Ravn Andersen
  • 73,784
  • 33
  • 194
  • 347
  • 4
    I can't imagine how using different classloaders will solve synchronisation issues. – rkosegi Jan 30 '17 at 13:05
  • 3
    Usually we talk about threadsafe classes and/or methods. A threadsafe *library* is quite broad. Which library is it? I'd definitely try to go for alternatives before starting to consider classloader hacks. – Kayaman Jan 30 '17 at 13:05
  • That fruit doesn't sound so low-hanging if you need to design special class loaders for each thread, when thread creation is not in your own hands. – RealSkeptic Jan 30 '17 at 13:10
  • I would suggest encapsulating the unsafe library with a java monitor (synchronized methods, wait(), notify(), ...) – AhmadWabbi Jan 30 '17 at 13:12
  • @Kayaman there is no alternative for this library. It is very specific for our needs. – Thorbjørn Ravn Andersen Jan 30 '17 at 13:19
  • @AhmadWabbi the rest of this program essentially just feeds the library and saves the result. Your suggestion would behave like the current single-threaded solution. – Thorbjørn Ravn Andersen Jan 30 '17 at 13:21
  • I didn't mean alternative libraries. I meant alternative methods. – Kayaman Jan 30 '17 at 13:21
  • @Kayaman Feel free to suggest better ways. If possible I would prefer to stay in a single JVM on a single machine. – Thorbjørn Ravn Andersen Jan 30 '17 at 13:21
  • So you're using this to validate a PDF file? – Kayaman Jan 30 '17 at 13:27
  • It's hard to give advice without seeing any code, but it's hard to imagine separate instances of whatever you're using being thread-unsafe. Of course if you just hope to get magic happening by converting `stream()` to `parallelStream()`, then that's a different issue altogether. Of course if the instances share static state then that's quite a bad design oversight. – Kayaman Jan 30 '17 at 13:30
  • 2
    *If possible I would prefer to stay in a single JVM* Are you seriously thinking you can run multiple parallel instances of a library when the only thing you really know about its implementation is that it's non-threadsafe? *I was considering using a separate classloader for each instance ... Is this the right approach?* No. You're deliberately misusing the library. *How shall I do this in order to have proper production quality?* You **can't**. Even if you [kludge](https://en.wikipedia.org/wiki/Kludge) something up, you'll only ever be able to **hope** it keeps working. Is that **really** OK? – Andrew Henle Jan 30 '17 at 13:34
  • @Kayaman Not quite. We are using it to extract characteristics in order to do risk assesment with regards to digital preservation. And yes, I would really like to use parallel streams to scale easily, and no that doesn't work well. Regarding design, we are just users, and I do not know if this usage is a supported scenario. – Thorbjørn Ravn Andersen Jan 30 '17 at 13:36
  • @AndrewHenle If I cannot do this, next step would be looking at each instance needed by the parallel stream in its own separate JVM. I am aware of the perils, and I have full access to the source of the library. – Thorbjørn Ravn Andersen Jan 30 '17 at 13:39
  • 1
    Parallel streams are a somewhat sad attempt. They give you the idea that you can write multithreaded code without thinking of the consequences. If we remove this whole talk about streams, I'd bet that the library runs perfectly well in a threadpool for example. – Kayaman Jan 30 '17 at 13:40
  • Did you analyze, why is the library not thread-safe? I can only imagine usage of static fields to share state, is this the case? – lexicore Jan 30 '17 at 13:40
  • @Kayaman would you care to explain in more detail why you expect this to work properly in a threadpool as opposed to streams (which as far as I know are implemented through a hidden threadpool)? – Thorbjørn Ravn Andersen Jan 30 '17 at 13:42
  • 1
    @ThorbjørnRavnAndersen *I have full access to the source of the library.* That's important, as you know a lot more about the implementation than the question states. Any possible answer almost certainly depends on what's in that code, which I suspect you can't post. – Andrew Henle Jan 30 '17 at 13:43
  • That depends on how you've implemented your code. If you've done only a `stream() -> parallelStream()` change, then assumedly you're sharing an instance of the library instead of providing an instance per thread. As lexicore said, if the "whole library" is thread-unsafe, that would be possible through shared state through static variables. That sounds unlikely given the quality of the library. – Kayaman Jan 30 '17 at 13:44
  • @Kayaman naturally I do not expect a single instance to be shared between threads. An instance is created per thread. Even creating an instance per invocation did not work. – Thorbjørn Ravn Andersen Jan 30 '17 at 13:51
  • 1
    Then I'd go through the source code to see what's going on. It's either a huge design oversight on the library's part, or an oversight on your part. This is in no way normal, and it's certainly not normal to start working with classloaders to get around this issue. – Kayaman Jan 30 '17 at 13:54
  • @AndrewHenle I have access to the source but I do not have time to comprehend it fully and start essentially debugging it. The source is available at the repositories available at https://github.com/veraPDF/ I agree on the observation on unsafe usage of static variables, but the question is whether this is by design or nor. – Thorbjørn Ravn Andersen Jan 30 '17 at 14:02
  • Definitely, separate classloaders provide enough isolation in your case. A smilar case - http://stackoverflow.com/a/30227614/2158288 – ZhongYu Jan 30 '17 at 19:00
  • 1
    You have to guarantee that the library was not loaded before else your custom classloader would reuse the one loaded previously if it follows the recommended delegation model. You also have to ensure that none of the non-thread safe classes are loaded by one of the default classloaders. And last but not least you have to ensure that the classloader as well a the loaded classes will get eligible for garbage collection at some point in time else you will face a memory leak – Roman Vottner Feb 02 '17 at 14:48
  • 1
    Are you sharing the objects between multiple threads? Can you give us an example of the parallelism you are trying to execute? Even if it's pseudo code. – John Vint Feb 02 '17 at 14:58
  • The project doesn't look retired. Why don't you want to contribute the project if it is mission critical to you? – Gregory.K Feb 02 '17 at 17:26
  • If following this suggestion, OP would then have to rewrite the library to replace offending static data into threadlocal - have I understood correctly? – tucuxi Feb 02 '17 at 16:45
  • Is there a way you can isolate the non-thread safe code? Similar to what is done with JAXBContext. https://jaxb.java.net/guide/Performance_and_thread_safety.html – John Feb 02 '17 at 16:45
  • @Gregory.K That is indeed an option - unfortunately I do not currently have the time to do so. Also it is unclear if it is a bug or not that the library is not thread-safe. – Thorbjørn Ravn Andersen Feb 02 '17 at 21:16
  • I would create a plugin architecture so you can dynamically and atomically load the classes involved in the processing of the streams and I would use an url class loader. get the job done pass it along and destroy the class loader as soon as they are not needed. – bichito Feb 02 '17 at 21:23
  • @efekctive Sounds great. Is this similar to any of the existing answers or do you have additional tricks to share? – Thorbjørn Ravn Andersen Feb 02 '17 at 21:25
  • You can place the parallel processing inside the "plugin" to simplify lambda processing. I would make it even a different project/jar so there is no chance of class paths interfering with each other – bichito Feb 02 '17 at 21:28
  • Let me think a little bit more about the lambda with plugins. Some other people have suggested url CL. I think they are the better option – bichito Feb 02 '17 at 21:36
  • I am assuming that you would be using some lambda. I would use an anonymous method to do the url class loading and execution of unsafe library. No need to mess with thread local. If this makes any sense – bichito Feb 02 '17 at 21:52
  • @efekctive I am not sure you understand the problem. Lambdas does not protect against multiple threads accessing the same static variables. – Thorbjørn Ravn Andersen Feb 02 '17 at 22:40
  • Are launching parallel stream processing without lambdas? The point that I wanted to make was that placing the url class loader in each of the parallel threads should take away the need of using thread local. I assumed that the cost of launching separate url CL was not an issue – bichito Feb 02 '17 at 22:42
  • So what I had in mind was basically the post that suggests the url class loader as a plugin to provide isolation and ease of change but without ThreadLocal because a method reference should suffice. But I may be wrong – bichito Feb 03 '17 at 01:59
  • Beside my answer with the example library class I was curious and wanted to download veraPDF of which I only know the GUI version from a previous project. But the development repo delivers broken binaries: `Could not transfer artifact org.verapdf:validation-model:jar:1.1.6 from/to vera-dev (http://artifactory.openpreservation.org/artifactory/vera-dev): GET request of: org/verapdf/validation-model/1.1.6/validation-model-1.1.6.jar from vera-dev failed: Premature end of Content-Length delimited message body (expected: 269563; received: 45840` I wonder how you managed to build anything... – kriegaex Feb 05 '17 at 02:01
  • @kriegaex I've seen the same. I am hoping it is a temporary thing. At the time the question was asked, it worked fine. – Thorbjørn Ravn Andersen Feb 05 '17 at 07:18
  • I sent them a message via their contact form. If it helps - no idea. In the meantime you can try my solution with your local binaries. Maybe you can also zip and upload all `org/verapdf` files from your local Maven repo for me to somewhere I can access them. Then I can play with your concrete problem some more if you like. – kriegaex Feb 05 '17 at 11:18
  • The Maven repo is working correctly again after I had created a [ticket](https://github.com/veraPDF/veraPDF-library/issues/703) and they resolved it. So I can download the veraPDF libs now. But beware, in addition to _org.verapdf:validation-model:1.1.6_ I also needed _org.verapdf:core:1.1.5_ in order to compile your code successfully. Now I could reproduce your problems with a couple of sample PDFs and fix them using my `ThreadSafeClassLoader`. Just follow the link to the GitHub repo in my answer and see for yourself. I will also update my answer again. – kriegaex Feb 07 '17 at 01:57
  • On a closing note, I would like to thank everybody who contributed. The 500 point bounty made the difference between having just a few comments, and several well-written answers and actual bug-fixes. – Thorbjørn Ravn Andersen Feb 09 '17 at 13:44
  • @NicolasFilotto Your understanding of the flow over time here is incorrect. Also there is a vast difference between bounties and accepted answers. – Thorbjørn Ravn Andersen Feb 10 '17 at 07:49
  • @NicolasFilotto No. There is not yet an accepted answer. I have chosen to award the bounty to the answer providing a _complete_ solution to the asked problem which you BTW didn't. I have _additionally_ chosen to provide an additional bounty to recognize that very useful work had been done _which had not been asked for_. Is the problem here that you want a bounty too? – Thorbjørn Ravn Andersen Feb 10 '17 at 09:12
  • @NicolasFilotto Two different persons. I started the bounty to attract attention to this question - which worked very well. All the answers so far has been written after I offered it. I do still not see why you insist that this is a question of running verapdf - it is a more general question _prompted_ by this particular library and that is what I am looking for answers to. Regarding your answer you are free to do whatever SO allows you to. You may want to consider undeleting it as it may be an excellent resource to future readers who want a broader view. – Thorbjørn Ravn Andersen Feb 10 '17 at 09:27

7 Answers7

13

We have faced similar challenges. Issues usually came from from static properties which became unwillingly "shared" between the various threads.

Using different classloaders worked for us as long as we could guarantee that the static properties were actually set on classes loaded by our class loader. Java may have a few classes which provide properties or methods which are not isolated among threads or are not thread-safe ('System.setProperties() and Security.addProvider() are OK - any canonical documentation on this matter is welcomed btw).

A potentially workable and fast solution - that at least can give you a chance to test this theory for your library - is to use a servlet engine such as Jetty or Tomcat.

Build a few wars that contain your library and start processes in parallel (1 per war).

When running code inside a servlet thread, the WebappClassLoaders of these engines attempt to load a classes from the parent class loader first (the same as the engine) and if it does not find the class, attempts to load it from the jars/classes packaged with the war.

With jetty you can programmatically hot deploy wars to the context of your choice and then theoretically scale the number of processors (wars) as required.

We have implemented our own class loader by extending URLClassLoader and have taken inspiration from the Jetty Webapp ClassLoader. It is not as hard a job as as it seems.

Our classloader does the exact opposite: it attempts to load a class from the jars local to the 'package' first , then tries to get them from the parent class loader. This guarantees that a library accidentally loaded by the parent classloader is never considered (first). Our 'package' is actually a jar that contains other jars/libraries with a customized manifest file.

Posting this class loader code "as is" would not make a lot of sense (and create a few copyright issues). If you want to explore that route further, I can try coming up with a skeleton.

Source of the Jetty WebappClassLoader

Bruno Grieder
  • 28,128
  • 8
  • 69
  • 101
8

The answer actually depends on what your library relies on:

  1. If your library relies on at least one native library, using ClassLoaders to isolate your library's code won't help because according to the JNI Specification, it is not allowed to load the same JNI native library into more than one class loader such that you would end up with a UnsatisfiedLinkError.
  2. If you library relies on at least one external resource that is not meant to be shared like for example a file and that is modified by your library, you could end up with complex bugs and/or the corruption of the resource.

Assuming that you are not in the cases listed above, generally speaking if a class is known as non thread safe and doesn't modify any static fields, using a dedicated instance of this class per call or per thread is good enough as the class instance is then no more shared.

Here as your library obviously relies and modifies some static fields that are not meant to be shared, you indeed need to isolate the classes of your library in a dedicated ClassLoader and of course make sure that your threads don't share the same ClassLoader.

For this you could simply create an URLClassLoader to which you would provide the location of your library as URL (using URLClassLoader.newInstance(URL[] urls, ClassLoader parent)), then by reflection you would retrieve the class of your library corresponding to the entry point and invoke your target method. To avoid building a new URLClassLoader at each call, you could consider relying on a ThreadLocal to store the URLClassLoader or the Class or the Method instance to be used for a given thread.


So here is how you could proceed:

Let's say that the entry point of my library is the class Foo that looks like this:

package com.company;

public class Foo {

    // A static field in which we store the name of the current thread
    public static String threadName;

    public void execute() {
        // We print the value of the field before setting a value
        System.out.printf(
            "%s: The value before %s%n", Thread.currentThread().getName(), threadName
        );
        // We set a new value
        threadName = Thread.currentThread().getName();
        // We print the value of the field after setting a value
        System.out.printf(
            "%s: The value after %s%n", Thread.currentThread().getName(), threadName
        );
    }
}

This class is clearly not thread safe and the method execute modifies the value of a static field that is not meant to be modified by concurrent threads just like your use case.

Assuming that to launch my library I simply need to create an instance of Foo and invoke the method execute. I could store the corresponding Method in a ThreadLocal to retrieve it by reflection only once per thread using ThreadLocal.withInitial(Supplier<? extends S> supplier) as next:

private static final ThreadLocal<Method> TL = ThreadLocal.withInitial(
    () -> {
        try {
            // Create the instance of URLClassLoader using the context 
            // CL as parent CL to be able to retrieve the potential 
            // dependencies of your library assuming that they are
            // thread safe otherwise you will need to provide their 
            // URL to isolate them too
            URLClassLoader cl = URLClassLoader.newInstance(
                new URL[]{/* Here the URL of my library*/},
                Thread.currentThread().getContextClassLoader()
            );
            // Get by reflection the class Foo
            Class<?> myClass = cl.loadClass("com.company.Foo");
            // Get by reflection the method execute
            return myClass.getMethod("execute");
        } catch (Exception e) {
            // Here deal with the exceptions
            throw new IllegalStateException(e);
        }
    }
);

And finally let's simulate a concurrent execution of my library:

// Launch 50 times concurrently my library
IntStream.rangeClosed(1, 50).parallel().forEach(
    i -> {
        try {
            // Get the method instance from the ThreadLocal
            Method myMethod = TL.get();
            // Create an instance of my class using the default constructor
            Object myInstance = myMethod.getDeclaringClass().newInstance();
            // Invoke the method
            myMethod.invoke(myInstance);
        } catch (Exception e) {
            // Here deal with the exceptions
            throw new IllegalStateException(e);
        }
    }
);

You will get an output of the next type that shows that we have no conflicts between threads and the threads properly reuse its corresponding class/field's value from one call of execute to another:

ForkJoinPool.commonPool-worker-7: The value before null
ForkJoinPool.commonPool-worker-7: The value after ForkJoinPool.commonPool-worker-7
ForkJoinPool.commonPool-worker-7: The value before ForkJoinPool.commonPool-worker-7
ForkJoinPool.commonPool-worker-7: The value after ForkJoinPool.commonPool-worker-7
main: The value before null
main: The value after main
main: The value before main
main: The value after main
...

Since this approach will create one ClassLoader per thread, make sure to apply this approach using a thread pool with a fixed number of threads and the number of threads should be chosen wisely to prevent running out of memory because a ClassLoader is not free in term of memory footprint so you need to limit the total amount of instances according to your heap size.

Once you are done with your library, you should cleanup the ThreadLocal for each thread of your thread pool to prevent memory leaks and to do so here is how you could proceed:

// The size of your the thread pool
// Here as I used for my example the common pool, its size by default is
// Runtime.getRuntime().availableProcessors()
int poolSize = Runtime.getRuntime().availableProcessors();
// The cyclic barrier used to make sure that all the threads of the pool
// will execute the code that will cleanup the ThreadLocal
CyclicBarrier barrier = new CyclicBarrier(poolSize);
// Launch one cleanup task per thread in the pool
IntStream.rangeClosed(1, poolSize).parallel().forEach(
    i -> {
        try {
            // Wait for all other threads of the pool
            // This is needed to fill up the thread pool in order to make sure 
            // that all threads will execute the cleanup code
            barrier.await();
            // Close the URLClassLoader to prevent memory leaks
            ((URLClassLoader) TL.get().getDeclaringClass().getClassLoader()).close();
        } catch (Exception e) {
            // Here deal with the exceptions
            throw new IllegalStateException(e);
        } finally {
            // Remove the URLClassLoader instance for this thread
            TL.remove();
        }
    }
);
Nicolas Filotto
  • 43,537
  • 11
  • 94
  • 122
  • There should be further investigation about the used non-threadsafe library. Even if your answer seems to be correct, the library could rely on a https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#loadLibrary(java.lang.String) As far as I know, a loaded native library would still be unsafe, because it is loaded once into the JVM, not by the classloader. – Christian Kuetbach Feb 02 '17 at 20:40
  • @ChristianKuetbach to the best of my knowledge there is no native code involved. – Thorbjørn Ravn Andersen Feb 02 '17 at 20:55
  • The github project says Java as language. I just wanted to mention native dlls. Do you build a webserive or something? In this case you could deploy the app with different names onto one server. The most webserver I know separate the classloader for you. You could try to validate huge PDFs at the same time, without rewrite you code. – Christian Kuetbach Feb 02 '17 at 21:04
  • How could you suggest that the synchronization issue is only with shared static fields? What if the logic itself is not usable in a multithreaded environment? It is a far more complex problem and is dependent on the library involved. – Siddharth Tyagi Feb 03 '17 at 00:55
7

I found the question interesing and created a little tool for you:

https://github.com/kriegaex/ThreadSafeClassLoader

Currently it is not available as an official release on Maven Central yet, but you can get a snapshot like this:

<dependency>
  <groupId>de.scrum-master</groupId>
  <artifactId>threadsafe-classloader</artifactId>
  <version>1.0-SNAPSHOT</version>
</dependency>

<!-- (...) -->

<repositories>
  <repository>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
    <id>ossrh</id>
    <name>Sonatype OSS Snapshots</name>
    <url>https://oss.sonatype.org/content/repositories/snapshots</url>
  </repository>
</repositories>

Class ThreadSafeClassLoader:

It uses JCL (Jar Class Loader) under the hood because it already offers class-loading, object instantiation and proxy generation features discussed in other parts of this thread. (Why re-invent the wheel?) What I added on top is a nice interface for exactly what we need here:

package de.scrum_master.thread_safe;

import org.xeustechnologies.jcl.JarClassLoader;
import org.xeustechnologies.jcl.JclObjectFactory;
import org.xeustechnologies.jcl.JclUtils;
import org.xeustechnologies.jcl.proxy.CglibProxyProvider;
import org.xeustechnologies.jcl.proxy.ProxyProviderFactory;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class ThreadSafeClassLoader extends JarClassLoader {
  private static final JclObjectFactory OBJECT_FACTORY = JclObjectFactory.getInstance();

  static {
    ProxyProviderFactory.setDefaultProxyProvider(new CglibProxyProvider());
  }

  private final List<Class> classes = new ArrayList<>();

  public static ThreadLocal<ThreadSafeClassLoader> create(Class... classes) {
    return ThreadLocal.withInitial(
      () -> new ThreadSafeClassLoader(classes)
    );
  }

  private ThreadSafeClassLoader(Class... classes) {
    super();
    this.classes.addAll(Arrays.asList(classes));
    for (Class clazz : classes)
      add(clazz.getProtectionDomain().getCodeSource().getLocation());
  }

  public <T> T newObject(ObjectConstructionRules rules) {
    rules.validate(classes);
    Class<T> castTo = rules.targetType;
    return JclUtils.cast(createObject(rules), castTo, castTo.getClassLoader());
  }

  private Object createObject(ObjectConstructionRules rules) {
    String className = rules.implementingType.getName();
    String factoryMethod = rules.factoryMethod;
    Object[] arguments = rules.arguments;
    Class[] argumentTypes = rules.argumentTypes;
    if (factoryMethod == null) {
      if (argumentTypes == null)
        return OBJECT_FACTORY.create(this, className, arguments);
      else
        return OBJECT_FACTORY.create(this, className, arguments, argumentTypes);
    } else {
      if (argumentTypes == null)
        return OBJECT_FACTORY.create(this, className, factoryMethod, arguments);
      else
        return OBJECT_FACTORY.create(this, className, factoryMethod, arguments, argumentTypes);
    }
  }

  public static class ObjectConstructionRules {
    private Class targetType;
    private Class implementingType;
    private String factoryMethod;
    private Object[] arguments;
    private Class[] argumentTypes;

    private ObjectConstructionRules(Class targetType) {
      this.targetType = targetType;
    }

    public static ObjectConstructionRules forTargetType(Class targetType) {
      return new ObjectConstructionRules(targetType);
    }

    public ObjectConstructionRules implementingType(Class implementingType) {
      this.implementingType = implementingType;
      return this;
    }

    public ObjectConstructionRules factoryMethod(String factoryMethod) {
      this.factoryMethod = factoryMethod;
      return this;
    }

    public ObjectConstructionRules arguments(Object... arguments) {
      this.arguments = arguments;
      return this;
    }

    public ObjectConstructionRules argumentTypes(Class... argumentTypes) {
      this.argumentTypes = argumentTypes;
      return this;
    }

    private void validate(List<Class> classes) {
      if (implementingType == null)
        implementingType = targetType;
      if (!classes.contains(implementingType))
        throw new IllegalArgumentException(
          "Class " + implementingType.getName() + " is not protected by this thread-safe classloader"
        );
    }
  }
}

I tested my concept with several unit and integration tests, among them one showing how to reproduce and solve the veraPDF problem.

Now this is what your code looks like when using my special classloader:

Class VeraPDFValidator:

We are just adding a static ThreadLocal<ThreadSafeClassLoader> member to our class, telling it which classes/libraries to put into the new classloader (mentioning one class per library is enough, subsequently my tool identifies the library automatically).

Then via threadSafeClassLoader.get().newObject(forTargetType(VeraPDFValidatorHelper.class)) we instantiate our helper class inside the thread-safe classloader and create a proxy object for it so we can call it from outside.

BTW, static boolean threadSafeMode only exists to switch between the old (unsafe) and new (thread-safe) usage of veraPDF so as to make the original problem reproducible for the negative integration test case.

package de.scrum_master.app;

import de.scrum_master.thread_safe.ThreadSafeClassLoader;
import org.verapdf.core.*;
import org.verapdf.pdfa.*;

import javax.xml.bind.JAXBException;
import java.io.InputStream;
import java.lang.reflect.InvocationTargetException;
import java.util.function.Function;

import static de.scrum_master.thread_safe.ThreadSafeClassLoader.ObjectConstructionRules.forTargetType;

public class VeraPDFValidator implements Function<InputStream, byte[]> {
  public static boolean threadSafeMode = true;

  private static ThreadLocal<ThreadSafeClassLoader> threadSafeClassLoader =
    ThreadSafeClassLoader.create(           // Add one class per artifact for thread-safe classloader:
      VeraPDFValidatorHelper.class,         //   - our own helper class
      PDFAParser.class,                     //   - veraPDF core
      VeraGreenfieldFoundryProvider.class   //   - veraPDF validation-model
    );

  private String flavorId;
  private Boolean prettyXml;

  public VeraPDFValidator(String flavorId, Boolean prettyXml)
    throws ClassNotFoundException, NoSuchMethodException, InvocationTargetException, IllegalAccessException {
    this.flavorId = flavorId;
    this.prettyXml = prettyXml;
  }

  @Override
  public byte[] apply(InputStream inputStream) {
    try {
      VeraPDFValidatorHelper validatorHelper = threadSafeMode
        ? threadSafeClassLoader.get().newObject(forTargetType(VeraPDFValidatorHelper.class))
        : new VeraPDFValidatorHelper();
      return validatorHelper.validatePDF(inputStream, flavorId, prettyXml);
    } catch (ModelParsingException | ValidationException | JAXBException | EncryptedPdfException e) {
      throw new RuntimeException("invoking veraPDF validation", e);
    }
  }
}

Class VeraPDFValidatorHelper:

In this class we isolate all access to the broken library. Nothing special here, just code copied from the OP's question. Everything done here happens inside the thread-safe classloader.

package de.scrum_master.app;

import org.verapdf.core.*;
import org.verapdf.pdfa.*;
import org.verapdf.pdfa.flavours.PDFAFlavour;
import org.verapdf.pdfa.results.ValidationResult;

import javax.xml.bind.JAXBException;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;

public class VeraPDFValidatorHelper {
  public byte[] validatePDF(InputStream inputStream, String flavorId, Boolean prettyXml)
    throws ModelParsingException, ValidationException, JAXBException, EncryptedPdfException
  {
    VeraGreenfieldFoundryProvider.initialise();
    PDFAFlavour flavour = PDFAFlavour.byFlavourId(flavorId);
    PDFAValidator validator = Foundries.defaultInstance().createValidator(flavour, false);
    PDFAParser loader = Foundries.defaultInstance().createParser(inputStream, flavour);
    ValidationResult result = validator.validate(loader);

    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    XmlSerialiser.toXml(result, baos, prettyXml, false);
    return baos.toByteArray();
  }
}
kriegaex
  • 63,017
  • 15
  • 111
  • 202
  • A snapshot release is now available on the Sonatype OSS snapshot repository as a preparation for a later release on Maven Central. See updated answer. – kriegaex Feb 08 '17 at 13:08
3

By isolating a library on a class loader per thread, you can guarantee any classes concurrency properties as you suggest. The only exception are libraries that explicitly interact with the bootstrap class loader or the system class loader. It is possible to inject classes into these class loaders by either reflection or the Instrumentation API. One example for such functionality would be Mockito's inline mock maker that does however not suffer a concurrency constraint as of my knowledge.

Implementing a class loader with this behavior is not all too difficult. The easiest solution would be to explicitly include the required jars in your project, e.g. as a resource. This way, you could use a URLClassLoader for loading your classes:

URL url = getClass().getClassLoader().getResource("validation-model-1.1.6.jar");
ClassLoader classLoader = new URLClassLoader(new URL[] {url}, null);

By referencing null as the super class loader of the URLClassLoader (second argument), you guarantee that there are no shared classes outside of the bootstrap classes. Note that you cannot use any classes of this created class loader from outside of it. However, if you add a second jar containing a class that triggers your logic, you can offer an entry point that becomes accessible without reflection:

class MyEntryPoint implements Callable<File> {
  @Override public File call() {
    // use library code.
  }
}

Simply add this class to its own jar and supply it as a second element to the above URL array. Note that you cannot reference a library type as a return value as this type will not be available to the consumer that lives outside the class loader that makes use of the entry point.

By wrapping the class loader creation into a ThreadLocal, you can guarantee the class loaders uniqunes:

class Unique extends ThreadLocal<ClassLoader> implements Closable {
  @Override protected ClassLoader initialValue() {
    URL validation = Unique.class.getClassLoader()
                          .getResource("validation-model-1.1.6.jar");
    URL entry = Unique.class.getClassLoader()
                          .getResource("my-entry.jar");
    return new URLClassLoader(new URL[] {validation, entry}, null);
  }

  @Override public void close() throws IOException {
    get().close(); // If Java 7+, avoid handle leaks.
    set(null); // Make class loader eligable for GC.
  }

  public File doSomethingLibrary() throws Exception {
    Class<?> type = Class.forName("pkg.MyEntryPoint", false, get());
    return ((Callable<File>) type.newInstance()).call();
  }
}

Note that class loaders are expensive objects and should be dereferenced when you do no longer need them even if a thread continues to live. Also, to avoid file leaks, you should close a URLClassLoader previously to dereferencing.

Finally, in order to continue using Maven's dependency resolution and in order to simplify your code, you can create a seperate Maven module where you define your entry point code and declare your Maven library dependencies. Upon packaging, use the Maven shade plugin to create an Uber jar that includes everything you need. This way, you only need to provide a single jar to your URLClassLoader and do not need to ensure all (transitive) dependencies manually.

Rafael Winterhalter
  • 42,759
  • 13
  • 108
  • 192
3

This answer is based on my original "plugin" comment. And it starts with a class loader that inherits from boot and extensions class loaders only.

package safeLoaderPackage;

import java.net.URL;
import java.net.URLClassLoader;

public final class SafeClassLoader extends URLClassLoader{
    public SafeClassLoader(URL[] paths){
        super(paths, ClassLoader.getSystemClassLoader().getParent());
    }   
}

This is the only class that needs to be included in the user's class path. This url class loader inherits from the parent of ClassLoader.getSystemClassLoader(). It just includes the boot and the extensions class loader. It has no notion of the class path used by the user.

Next

package safeLoaderClasses;

import java.net.URL;
import java.util.ArrayList;
import java.util.Collection;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

public class SecureClassLoaderPlugin <R> {

    private URL[] paths;
    private Class[] args;
    private String method;
    private String unsafe;

    public void setMethodData(final String u, final URL[] p, String m, Class[] a){
        method = m;
        args = a;
        paths = p;
        unsafe = u;
    }

    public Collection<R> processUnsafe(Object[][] p){
        int i;
        BlockingQueue<Runnable> q;
        ArrayList<R> results = new ArrayList<R>();
        try{
            i = p.length;
            q = new ArrayBlockingQueue<Runnable>(i);
            ThreadPoolExecutor tpe = new ThreadPoolExecutor(i, i, 0, TimeUnit.NANOSECONDS, q);
            for(Object[] params : p)
                tpe.execute(new SafeRunnable<R>(unsafe, paths, method, args, params, results));
            while(tpe.getActiveCount() != 0){
                Thread.sleep(10);
            }
            for(R r: results){
                System.out.println(r);
            }
            tpe.shutdown();
        }
        catch(Throwable t){

        }
        finally{

        }
        return results;
    }
}

and

package safeLoaderClasses;

import java.io.IOException;
import java.lang.reflect.Method;
import java.net.URL;
import java.util.ArrayList;

import safeLoaderInterface.SafeClassLoader;

class SafeRunnable <R> implements Runnable{
    final URL[] paths;
    final private String unsafe;
    final private String method;
    final private Class[] args;
    final private Object[] processUs;
    final ArrayList<R> result;

    SafeRunnable(String u, URL[] p, String m, Class[] a, Object[] params, ArrayList<R> r){
        unsafe = u;
        paths = p;
        method = m;
        args = a;
        processUs = params;
        result = r;
    }

    public void run() {
        Class clazz;
        Object instance;
        Method m;
        SafeClassLoader sl = null;

        try{
            sl = new SafeClassLoader(paths);
            System.out.println(sl);

            clazz = sl.loadClass(unsafe);
            m = clazz.getMethod(method, args);
            instance = clazz.newInstance();
            synchronized(result){
                result.add((R) m.invoke(instance, processUs));
            }
        }
        catch(Throwable t){
            t.printStackTrace();
        }
        finally{
            try {
                sl.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

are the plugin jar. No lambdas. Just a thread pool executor. Each thread just adds to a result list after execution.

The generics need polishing but I have tested these against this class (resides in a different jar)

package stackoverflow4;

public final class CrazyClass {

    static int i = 0;

    public int returnInt(){
        System.out.println(i);
        return 8/++i;
    }
}

This would be the way to connect from one's code. The path to the class loader needs to be included because it is lost with the getParent() call

private void process(final String plugin, final String unsafe, final URL[] paths) throws Exception{
        Object[][] passUs = new Object[][] {{},{}, {},{}, {},{},{},{},{},{}};
        URL[] pathLoader = new URL[]{new File(new String(".../safeLoader.jar")).toURI().toURL(), 
                new File(new String(".../safeLoaderClasses.jar")).toURI().toURL()};
        //instantiate the loader
        SafeClassLoader sl = new SafeClassLoader(pathLoader); 
        System.out.println(sl);
        Class clazz = sl.loadClass("safeLoaderClasses.SecureClassLoaderPlugin");
        //Instance of the class that loads the unsafe jar and launches the thread pool executor
        Object o = clazz.newInstance(); 
        //Look up the method that set ups the unsafe library
        Method m = clazz.getMethod("setMethodData", 
                new Class[]{unsafe.getClass(), paths.getClass(), String.class, new Class[]{}.getClass()});
        //invoke it
        m.invoke(o, new Object[]{unsafe,paths,"returnInt", new Class[]{}});
        //Look up the method that invokes the library
        m = clazz.getMethod("processUnsafe", new Class[]{ passUs.getClass()});
        //invoke it
        o = m.invoke(o, passUs);
        //Close the loader
        sl.close();
    }

with up to 30+ threads and it seems to work. The plugin uses a separate class loader and each of the threads use their own class loader. After leaving the method everything is gc'ed.

bichito
  • 1,406
  • 2
  • 19
  • 23
1

I believe you should try to fix the problem before seeking for workaround.

You can always run you code in two threads, classloaders, processes, containers, VM, or machines. But they are none of the ideal.

I saw two defaultInstance() from the code. Do the instance threadsafe? If not, can we have two instance? Is it a factory or a singleton?

Second, where the conflicts happen? If it was about initialization/cache problem, a pre warming should fix.

Last but not least, if the library was open-source, fork it fix it and pull request.

Dennis C
  • 24,511
  • 12
  • 71
  • 99
  • Thank you for your answer. I do not currently have the time for forking, fixing and testing. The basic problem is unchanged - how do I do this for a given, reasonably well-behaved but unsafe library. – Thorbjørn Ravn Andersen Feb 04 '17 at 14:20
1

"It is unfeasible to harden the library" but it is feasible to introduce such a bloody workaround like a custom class loader?

OK. I am the first who dislikes the replies which are not a reply to the original question. But I honestly believe that patching the library is much easier to do and to mantain than introducing a custom class loader.

The blocker is class org.verapdf.gf.model.impl.containers.StaticContainers which static fields can be easily changed to work per thread as shown below. This impacts six other classes

org.verapdf.gf.model.GFModelParser
org.verapdf.gf.model.factory.colors.ColorSpaceFactory
org.verapdf.gf.model.impl.cos.GFCosFileSpecification
org.verapdf.gf.model.impl.external.GFEmbeddedFile
org.verapdf.gf.model.impl.pd.colors.GFPDSeparation
org.verapdf.gf.model.tools.FileSpecificationKeysHelper

You can still have only one PDFAParser per thread. But the fork takes ten minutes to do and worked for me in a basic multithread smoke test. I'd test this and contact the original author of the library. Maybe he is happy to merge and you can just keep a Maven reference to the updated and mantained library.

package org.verapdf.gf.model.impl.containers;

import org.verapdf.as.ASAtom;
import org.verapdf.cos.COSKey;
import org.verapdf.gf.model.impl.pd.colors.GFPDSeparation;
import org.verapdf.gf.model.impl.pd.util.TaggedPDFRoleMapHelper;
import org.verapdf.model.pdlayer.PDColorSpace;
import org.verapdf.pd.PDDocument;
import org.verapdf.pdfa.flavours.PDFAFlavour;

import java.util.*;

public class StaticContainers {

    private static ThreadLocal<PDDocument> document;
    private static ThreadLocal<PDFAFlavour> flavour;

    // TaggedPDF
    public static ThreadLocal<TaggedPDFRoleMapHelper> roleMapHelper;

    //PBoxPDSeparation
    public static ThreadLocal<Map<String, List<GFPDSeparation>>> separations;
    public static ThreadLocal<List<String>> inconsistentSeparations;

    //ColorSpaceFactory
    public static ThreadLocal<Map<String, PDColorSpace>> cachedColorSpaces;

    public static ThreadLocal<Set<COSKey>> fileSpecificationKeys;

    public static void clearAllContainers() {
        document = new ThreadLocal<PDDocument>();
        flavour = new ThreadLocal<PDFAFlavour>();
        roleMapHelper = new ThreadLocal<TaggedPDFRoleMapHelper>();
        separations = new ThreadLocal<Map<String, List<GFPDSeparation>>>();
        separations.set(new HashMap<String,List<GFPDSeparation>>());
        inconsistentSeparations = new ThreadLocal<List<String>>();
        inconsistentSeparations.set(new ArrayList<String>());
        cachedColorSpaces = new ThreadLocal<Map<String, PDColorSpace>>();
        cachedColorSpaces.set(new HashMap<String,PDColorSpace>());
        fileSpecificationKeys = new ThreadLocal<Set<COSKey>>();
        fileSpecificationKeys.set(new HashSet<COSKey>());
    }

    public static PDDocument getDocument() {
        return document.get();
    }

    public static void setDocument(PDDocument document) {
        StaticContainers.document.set(document);
    }

    public static PDFAFlavour getFlavour() {
        return flavour.get();
    }

    public static void setFlavour(PDFAFlavour flavour) {
        StaticContainers.flavour.set(flavour);
        if (roleMapHelper.get() != null) {
            roleMapHelper.get().setFlavour(flavour);
        }
    }

    public static TaggedPDFRoleMapHelper getRoleMapHelper() {
        return roleMapHelper.get();
    }

    public static void setRoleMapHelper(Map<ASAtom, ASAtom> roleMap) {
        StaticContainers.roleMapHelper.set(new TaggedPDFRoleMapHelper(roleMap, StaticContainers.flavour.get()));
    }
}
Serg M Ten
  • 5,568
  • 4
  • 25
  • 48
  • 1
    It is just not true that we like the workaround better than hardening the library. The latter is always better, if possible. But the OP explicitly asked if isolation via classloader is possible in principal. This is why we tried to answer the question. And BTW, not every library is open source and not every upstream maintainer accepts pull requests or even complaints. So the question as well as the suggested solutions (or workarounds, "bloody" or not) are valid. – kriegaex Feb 09 '17 at 13:02
  • 1
    Thank you for doing the work needed - I am sure the development team will be happy for your patch. I would like to mention that your choice of words is unfortunate - even though the actual problem prompting the question is fixable, the idea was to clarify if this was a way to work around the problem in general (for situations not as easily fixable). – Thorbjørn Ravn Andersen Feb 09 '17 at 13:39
  • 1
    @Serg, I just found this one again and was wondering why you never actually created a pull request from your fix. It looks like some of the thread-local stuff is now in the upstream project, but it still differs from your own changes. Maybe you want to rebase your own fix on the latest master (or whatever is the current development branch there) and check if you can add more value to to upstream product. I have not tested this since the question was asked, but maybe there are still open issues in veraPDF. It would be sad for your good work to be lost. – kriegaex Mar 03 '21 at 04:35
  • I shall put that on my to-do list. I do not know when I might do that though. It's been quite a while since I reviewed that code. Maybe it's sorted in another way. I do not know. – Serg M Ten Mar 05 '21 at 11:04