1

I have the following scenario, I have a class loader and a class it loaded, and now I need the bytecode for that class. Here is what I have tried so far:

    Field f = ClassLoader.class.getDeclaredField("classes");
    f.setAccessible(true);

    ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
    Vector<Class> classes =  (Vector<Class>) f.get(classLoader);

    for(Class loadedClass : classes)
    {
        String className = loadedClass.getName();
        String classFileResourcePath = "/" + className.replace(".", "/") + ".class";
        InputStream inputStream = classLoader.getResourceAsStream(classFileResourcePath);
        System.out.println(">>>> " + className + " => " + classFileResourcePath + " => " + inputStream);
    }

This code prints null for each class file. But when I change it to classLoader.getClass().getResourceAsStream(classFileResourcePath) it works if run in a standalone Main class in an IDE, but when I get to the actual context where this is needed, this returns null as well, presumably because there are "special" things happening with the jars and the classes behind the scenes. Without being able to discuss those details, it suffices to say what I have is a class and the class loader that loaded it, and now I need the byte code. How do I do this? If this is not possible in the Java layer, I may be able to fetch the original Jar itself and read it as a zip file, but that would be last resort.

David Williams
  • 8,388
  • 23
  • 83
  • 171

4 Answers4

3

There are actually several issues with your code sample:

First, you access the "classes" field of the java.lang.ClassLoader class to determine which classes are already loaded. This is a private field and if you let your code run in an environment where specialized class loaders are used (subclasses of java.lang.ClassLoader), you have more or less no idea what is contained in that field.

Using ClassLoader.getResourceAsStream, you prefix the path with an "/", which is not correct. ClassLoader.getResourceAsStream expects an absolute path and the path starts with the name of the first segment, e.g. use ClassLoader.getResourceAsStream("java/lang/ClassLoader.class") instead of ClassLoader.getResourceAsStream("/java/lang/ClassLoader.class").

Using Class.getResourceAsStream, you can either provide an absolute path starting with "/", or provide a path relative to the relevant class, not starting with "/". E.g. ClassLoader.class.getResourceAsStream("ClassLoader.class") or ClassLoader.class.getResourceAsStream("/java/lang/ClassLoader.class") will normally both give you access to the class' byte code.

Both approaches do however require that the class files are available as resources on the class path using the standard naming conventions for Java runtime environments. There is no requirement that a Java runtime environment must operate this way. Java classes may be generated dynamically, causing them to be known by the class loader, but not backed by persistent byte code. Proprietary class loaders are also not required to use the same mapping between class names and resource paths as the standard class loaders.

Java class loaders also do not offer a public API to access a class' byte code. If you separate the VM in a "native code part" and "Java code part", it is also quite obvious that the VM usually doesn't need a reference to the raw byte code from the "Java code part".

Relying on the conventions used by the standard class loaders, you can use your approach and it will mostly work in standalone applications. But as you've found out yourself, it may fail if you run the code in a different environment, e.g. when deployed to an application server or when using packaging frameworks like OSGi.

jarnbjo
  • 33,923
  • 7
  • 70
  • 94
  • This is a great answer. tl;dr: remove the "/" prefix, and beware that this approach only works for classes that aren't dynamically generated and for class loaders that use a standard class name -> filename mapping. – Brett Kail Oct 30 '14 at 13:48
  • Yep, I was totally aware that the above is the wrong way to do it. It turns out in the environment I am programming in a special class loader was needed. – David Williams Nov 03 '14 at 22:58
1

The preferred method is Class.getResource or Class.getResourceAsStream. This will automatically use the correct ClassLoader (or use ClassLoader.getSystemResource() if the ClassLoader is null). It will also resolve the resource within the package of the class unless you prepend the resource name with a '/'.

So for a Class object not representing a nested class, you can request the associated resource using theClass.getResourceAsStream(theClass.getSimpleName()+".class")

If you need the correct handling of inner classes, you will get the qualified name via Class.getName() and transform it using either '/'+name.replace('.', '/')+".class") or name.substring(name.lastIndexOf('.')+1)+".class")

If this fails, the ClassLoader does not support getting the class bytecode or the class has been generated on-the-fly and added without recording the byte code in a way the ClassLoader could use.


If you want to be able to retrieve the byte code even for such classes, you need a JVM supporting Instrumentation. A ClassFileTransformer will get the byte code an input and hence may store it somewhere without actually transforming it, if that’s the intent.

See also Instrumentation.getInitiatedClasses(java.lang.ClassLoader) for a reliable way to get the classes of a particular ClassLoader.

However, you should be aware that this is not necessarily the byte code as passed to defineClass as the JVM might strip information irrelevant for the execution and also store the data in an optimized form creating an equivalent but not exactly matching byte code when transforming it back for passing it to the transformer.

The other caveat is that if there are other transformers registered within the JVM, e.g. if your using an instrumenting profiler at the same time, you haven’t precise control over the order of the transformers. I.e. the first transformer will see byte code equivalent to the code stored on disk while the last of the chain will see code equivalent to the one finally executed by the JVM, while an in-between transformer sees something which might match neither of them.

Note that even with getResourceAsStream the byte code doesn’t need to match, e.g. if the underlying resource has been modified since defineClass has been called. And in principle, ClassLoaders are not enforced to implement loadClass/findClass and getResourceAsStream in a consistent way.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • 1
    Note: the Instrumentation API is available to Java Agents only. [This answer](http://stackoverflow.com/a/19912148/2711488) shows at it’s end how an application can access this API by starting itself as a Java Agent if the JVM supports attaching Agents at runtime. See also http://stackoverflow.com/a/19496211/2711488 and http://stackoverflow.com/q/18322117/2711488 – Holger Oct 30 '14 at 11:28
  • Using Instrumentation is clever, but there's no way to guarantee that your ClassFileTransformer will run "first" (if you actually want the "raw" class bytes) or "last" (if you want the class bytes as actually used by the JVM). – Brett Kail Oct 30 '14 at 13:50
  • @bkail: right, but that’s an issue only in the case there *are* other transformers which is not the common case. The questioner did not give us enough information to deduce whether it’s a problem here. – Holger Oct 30 '14 at 14:28
  • I agree, but I think it's important to mention the caveat since we don't know what OP is doing. If you add that to your answer, I would upvote since this is good information. – Brett Kail Oct 30 '14 at 15:01
1

As mentioned by @jarnbjo, this is not a generally working approach. I was looking for a generic approach. I've found two promising approach and only one actually working approach:

a. Instrumentation API. This works. I have decided not to use it because of difficulties when trying to modify some classes. The instrumentation agent runs in the same JVM and when it tries to instrument the classes it depends on, some very weird exceptions may occur. (I've learned some new exception types. Ehm, java.lang.ClassCircularityError...)

But this is likely to be OK for you if you admit adding an instrumentation agent (via JVM args) when the JVM starts. You seems to need only reading of the bytecode, so you should never get such troubles

b. JDI, Java Debug Interface. This looked very promising. I've started writing a script that reconstructs the bytecode from the JDI API. There was almost everything I needed, except the exception table. So it is not very useful. If you have all the instructions, but don't have the ExceptionTable attribute, you can't do any flow analysis, decompile the source and so on. Some exception handlers will look like a dead code without the ExceptionTable. You can just see the current position in the bytecode, without some important information.

v6ak
  • 1,636
  • 2
  • 12
  • 27
-1

You should have a look at the ASM library.

With the library you can access the bytecode like this:

ClassReader cr = new ClassReader("java.lang.Runnable");
ClassNode cn = new ClassNode();
cr.accept(cn, 0);

Then you can access the information object based by using the getters of the ClassNode. An event based analysis using visitors is also possible.

Note that you can instantiate the ClassReader with an input stream or a byte array instead of the class name as well.

nrainer
  • 2,542
  • 2
  • 23
  • 35
  • Cool thanks, I can use ASM. Question: how do you get the raw bytes from the node? – David Williams Oct 29 '14 at 17:49
  • What information exactly do you want to retrieve? Probably you could access ``ClassNode.method``, loop over the entries and cast them to ``MethodNode``. Each ``MethodNode`` holds the list of its instructions. – nrainer Oct 29 '14 at 17:53
  • Lol, no, I really need the bytes, dont ask. This seemed to work to get the bytes `ClassWriter classWriter = new ClassWriter(ClassWriter.COMPUTE_MAXS); classNode.accept(classWriter);byte[] bytes = classWriter.toByteArray();` I'm working on trying the result in the real env. – David Williams Oct 29 '14 at 18:03
  • Hm, I am getting a `java.io.IOException: Class not found` from the `Reader`, I wonder, do I need to point ASM to the right class loader? – David Williams Oct 29 '14 at 18:09
  • You could try to create the ``ClassReader`` with an input stream as constructor argument... – nrainer Oct 29 '14 at 18:11
  • 2
    Downvoting since ClassReader with a string parameter is only capable of reading classes from the system class loader (not the thread context class loader as suggested by the original post), and ultimately it's just using ClassLoader.getSystemResourceAsStream, so using ClassReader/ClassWriter to call getResourceAsStream but then parse/reserialize the class is a huge waste of CPU/memory. – Brett Kail Oct 30 '14 at 13:46