18

I've been trying to set up a custom classloader that intercepts classes to print out which classes are being loaded into the application. The classloader looks like this

public class MyClassLoader extends ClassLoader {
    @Override
    public Class<?> loadClass(String name) throws ClassNotFoundException {
        System.out.println("Loading: " + name);
        return super.loadClass(name);
    }
}     

It just spits out the name of all the classes it loads. However, when i try to run some code,

import org.python.util.PythonInterpreter;
public class Scripts {
    public String main(){

        PythonInterpreter p = new PythonInterpreter();
        p.exec("print 'Python ' + open('.gitignore').read()");

        return "Success! Nothing broke";
    }
}

via

MyClassLoader bcl = new MyClassLoader();
Class c = bcl.loadClass("Scripts");

Method m = c.getMethod("main");
String result = (String) m.invoke(c.getConstructor().newInstance());

it prints out

Loading: Scripts
Loading: java.lang.Object
Loading: java.lang.String
Loading: org.python.util.PythonInterpreter
Python build/
.idea/*
*.iml
RESULT: Success! Nothing broke

Which seems rather odd. org.python.util.PythonInterpreter is not a simple class, and it depends on a whole bunch of other classes in the org.python.util package. Those classes are clearly being loaded, for the exec'd python code is able to do stuff and read my file. For some reason, though, those classes are not being loaded by the classloader which loaded PythonInterpreter.

Why is that? I was under the impression that the classloader used to load a class C would be used to load all the other classes needed by C, but that's clearly not happening here. Is that assumption mistaken? If it is, how do i set it up such that all the transitive dependencies of C are loaded by my classloader?

EDIT:

Some experiments with using URLClassLoader, which was suggested. I modified the delegation in loadClass():

try{
    byte[] output = IOUtils.toByteArray(this.getResourceAsStream(name));
    return instrument(defineClass(name, output, 0, output.length));
}catch(Exception e){
    return instrument(super.loadClass(name));
}

as well as made MyClassLoader subclass URLClassLoader rather than plain ClassLoader, grabbing URLs via:

super(((URLClassLoader)ClassLoader.getSystemClassLoader()).getURLs());

But it doesn't seem to be the right thing. In particular, getResourceAsStream() is throwing nulls back at me for all the classes I'm requesting, even non-system classes like that Jython lib.

Li Haoyi
  • 15,330
  • 17
  • 80
  • 137

5 Answers5

20

Basics of Class Loading

There are two main places to extend a class loader to change the way classes are loaded:

  • findClass(String name) - You override this method when you want to find a class with the usual parent first delegation.
  • loadClass(String name, boolean resolve) - Override this method when you want to change the way that class loading delegation is done.

However, classes can only come from the final defineClass(...) methods provided by java.lang.ClassLoader. Since you would like to capture all of the classes that are loaded, we will need to override loadClass( String, boolean ) and use a call to defineClass(...) somewhere in it.

NOTE: Inside of the defineClass(...) methods, there is a JNI binding to the native side of the JVM. Inside of that code, there is a check for classes in the java.* packages. It will only let those classes be loaded by the system class loader. This prevents you from messing with the internals of Java itself.

An Example Child First ClassLoader

This is a very simple implementation of the ClassLoader that you are trying to create. It assumes that all of the classes you need are available to the parent class loader, so it just uses the parent as a source for class bytes. This implementation uses Apache Commons IO for brevity, but it could easily be removed.

import java.io.IOException;
import java.io.InputStream;

import static org.apache.commons.io.IOUtils.toByteArray;
import static org.apache.commons.io.IOUtils.closeQuietly;
...
public class MyClassLoader
  extends ClassLoader {
  MyClassLoaderListener listener;

  MyClassLoader(ClassLoader parent, MyClassLoaderListener listener) {
    super(parent);
    this.listener = listener;
  }

  @Override
  protected Class<?> loadClass(String name, boolean resolve)
    throws ClassNotFoundException {
    // respect the java.* packages.
    if( name.startsWith("java.")) {
      return super.loadClass(name, resolve);
    }
    else {
      // see if we have already loaded the class.
      Class<?> c = findLoadedClass(name);
      if( c != null ) return c;

      // the class is not loaded yet.  Since the parent class loader has all of the
      // definitions that we need, we can use it as our source for classes.
      InputStream in = null;
      try {
        // get the input stream, throwing ClassNotFound if there is no resource.
        in = getParent().getResourceAsStream(name.replaceAll("\\.", "/")+".class");
        if( in == null ) throw new ClassNotFoundException("Could not find "+name);

        // read all of the bytes and define the class.
        byte[] cBytes = toByteArray(in);
        c = defineClass(name, cBytes, 0, cBytes.length);
        if( resolve ) resolveClass(c);
        if( listener != null ) listener.classLoaded(c);
        return c;
      } catch (IOException e) {
        throw new ClassNotFoundException("Could not load "+name, e);
      }
      finally {
        closeQuietly(in);
      }
    }
  }
}

And this is a simple listener interface for watching classes load.

public interface MyClassLoaderListener {
  public void classLoaded( Class<?> c );
}

You can then create a new instance of MyClassLoader, with the current class loader as the parent, and monitor classes as they are loaded.

MyClassLoader classLoader = new MyClassLoader(this.getClass().getClassLoader(), new MyClassLoaderListener() {
  public void classLoaded(Class<?> c) {
    System.out.println(c.getName());
  }
});
classLoader.loadClass(...);

This will work in the most general case and will allow you to get notified when classes are loaded. However, if any of those classes create their own child first class loaders, then they could bypass the notification code added here.

More Advanced Class Loading

To really trap classes being loaded, even when a child class loader overrides loadClass(String, boolean), you have to insert code between the classes you are loading and any of the calls that they may make to ClassLoader.defineClass(...). To do this, you have to start getting into byte code rewriting with a tool like ASM. I have a project called Chlorine on GitHub that uses this method to rewrite java.net.URL constructor calls. If you are curious about messing with classes at load time, I would check that project out.

Christian Trimble
  • 2,126
  • 16
  • 27
  • Do you have a source/citation for "there is a JNI binding to the native side of the JVM [i]nside of that code, there is a check for **classes in the java.*​** packages"? – Pacerier Aug 25 '14 at 14:04
  • @Pacerier this information comes from trying to do "bad" things to the class loader, like reengineering the URL class. This information is, however, several years old and may be outdated. – Christian Trimble Oct 30 '14 at 20:16
  • I think that just to be (Thread) safe, the body of the `loadClass` implementation in MyClassLoader should be wrapped in a `synchronized (getClassLoadingLock(name)) {}` block. At least that's what the base class is doing... – Bogdan Dec 29 '14 at 17:54
  • @Bogdan Thanks for pointing that out. I will dig up my test cases for this answer and make sure this change works as expected. – Christian Trimble Jan 06 '15 at 17:13
3

If you want to print the classes as they are loaded, how about switching on the verbose:class option on the JVM?

java -verbose:class your.class.name.here

To answer your direct questions:

Why is that? I was under the impression that the classloader used to load a class C would be used to load all the other classes needed by C, but that's clearly not happening here. Is that assumption mistaken? If it is, how do i set it up such that all the transitive dependencies of C are loaded by my classloader?

While searching the ClassLoaders, the search is performed from the leaf ClassLoader to the root, when Java works out a new class has to be loaded, it is performed from the root of the ClassLoader tree back down to the leaf that initiated the class resolution.

Why? Consider if your custom class wanted to load something from the Java standard libraries. The correct answer is that this should be loaded by the System ClassLoader so that class can be maximally shared. Especially when you consider that the class being loaded would then potentially load a whole lot more classes.

This also solves the problem that potentially you could end up with multiple system Classes instances being loaded in different ClassLoaders - each with the same fully qualified name. EDIT Classes would be resolved correctly in their ClassLoader. However there are two problems.

  1. Let's say we have two String instances, a and b. a.getClass().isInstance(b) and a.getClass() == b.getClass() are not be true if a and b were instantiated in different ClassLoaders. This would cause horrific problems.
  2. Singletons: they would not be singletons - you can have one per ClassLoader.

END EDIT

One other observation: Just like you have set up a ClassLoader to specifically load classes from, interpreters often themselves create ClassLoader instances into which they load the interpreting environment and the script. That way, if the script changes, the ClassLoader can be dropped (and with it the script), and reloaded in a new ClassLoader. EJBs and Servlets also use this trick.

Andrew Alcock
  • 19,401
  • 4
  • 42
  • 60
  • `This also solves the problem that potentially you could end up with multiple system Classes instances being loaded in different ClassLoaders - each with the same fully qualified name.` Isn't kind of the point of classloaders that a class is defined by both its fully qualified name *and* it's Classloader? In which case this isn't really a problem? – Li Haoyi Nov 15 '12 at 17:02
  • 1
    Your example of having two string instances that are not equal is not possible in a properly written JVM. Only the system class loader can load classes in the java.* packages. The JVM implementations that I have read enforce this on the C side of things, by preventing classes in that package from being defined by anything but the system class loader. – Christian Trimble Nov 20 '12 at 08:49
  • @C.Trimble: I completely concur that only the system classloader can load String - my statement is a **counter**-example showing what would happen if the Java classloading logic was not followed. In theory the hardcoding you mention is not required if the official classloading logic is followed but thr hardcoding you mention is a welcome additional security measure. (but this was possible back in JVM 1.0 days) – Andrew Alcock Nov 20 '12 at 12:28
2

If you do

    System.out.println( p.getClass().getClassLoader() );

you'll see that p's classloader isn't your MyClassLoader bcl. It was actually loaded by bcl's parent, the system class loader.

When PythonInterpreter loads its dependent classes, it'll use its actual class loader, the system classloader, not your bcl, so your interception isn't reached.

To solve the problem, your classloader can't delegate to its parent, it has to actually load the classes by itself.

For that you can subclass URLClassLoader (steal the URLs from the system classloader).

irreputable
  • 44,725
  • 9
  • 65
  • 93
  • I see the `null` when i do hat print statement. Do you have any links i could follow to see how to grab the URLs from the system classloader? My googlefu is weak tonight – Li Haoyi Nov 10 '12 at 00:31
  • oracle jdk's system classloader is a subclass of `URLClassloader`, so you can downcast it and call `URLClassloader.getURLs()` – irreputable Nov 10 '12 at 00:47
  • I added a snippet to the question which is what I have right now; I haven't managed to get the custom classloader to work, after I got the URLs out of the system classloader. – Li Haoyi Nov 10 '12 at 01:13
0

what if you override the other loadClass() method?

protected Class<?> loadClass(String name, boolean resolve)
irreputable
  • 44,725
  • 9
  • 65
  • 93
  • No dice. It's catching everything twice now, I'm guessing because one of the overloads delegates to the other, but it's still only catching the top level `PythonInterpreter` and not its dependencies. – Li Haoyi Nov 09 '12 at 20:36
0

You can use the PySystemState object to specify a custom class loader before you instantiate the PythonInterpreter.

PySystemState state = new PySystemState();
state.setClassLoader(classLoader);
PythonInterpreter interp = new PythonInterpreter(table, state);

http://wiki.python.org/jython/LearningJython

jimbo
  • 11,004
  • 6
  • 29
  • 46
  • I suppose my question is more directed to the Java side than the Jython side. Lookin in `src/org/python/util/PythonInterpreter` I see a list of 19 imports at the top of the file. Presumably these classes are being used throughout the class, why aren't they being loaded by the same classloader that loaded the main class? Not really Jython specific – Li Haoyi Nov 09 '12 at 20:51
  • not to say that I don't want to figure out the intricacies of Jython's own classloading system, but one step at a time... – Li Haoyi Nov 09 '12 at 20:52