10

Consider the program:

public class Test {

    public static void main(String[] args) {
        if (Arrays.asList(args).contains("--withFoo")) {
            use(new Foo());
        }
    }

    static void use(Foo foo) {
        // do something with foo
    }
}

Is Foo required in the runtime classpath if the program is launched without arguments?

Research

The Java Language Specification is rather vague when Linkage Errors are reported:

This specification allows an implementation flexibility as to when linking activities (and, because of recursion, loading) take place, provided that the semantics of the Java programming language are respected, that a class or interface is completely verified and prepared before it is initialized, and that errors detected during linkage are thrown at a point in the program where some action is taken by the program that might require linkage to the class or interface involved in the error.

My Tests indicate that LinkageErrors are only thrown when I actually use Foo:

$ rm Foo.class

$ java Test

$ java Test --withFoo

Exception in thread "main" java.lang.NoClassDefFoundError: Foo
        at Test.main(Test.java:11)
Caused by: java.lang.ClassNotFoundException: Foo
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 1 more

Can this behaviour be relied upon? Or is there any mainstream JVM that links unused code? If so, how can I isolate unused code such that it is only linked if needed?

meriton
  • 68,356
  • 14
  • 108
  • 175
  • 1
    You need `Foo` during compiling, right? After that, the classloader won't verify whether it exists, unless some code actually use it, and trigger the classloader to search & load it. – Eric Jan 31 '17 at 19:07
  • may be you can move all the `Foo` references in you 'Main' class to a different class which is instantiated/referenced based on the condition. This will improve the isolation. – Jos Jan 31 '17 at 19:10
  • @EricWang: Yes, `Foo` is available during compilation, but may not be available in the runtime classpath. – meriton Jan 31 '17 at 19:15
  • No, as long as a class is not initialized, there won't be any problems. Java allows eager loading, but the behavior must be transparent, i.e. the program behavior is the same as under lazy loading. – ZhongYu Jan 31 '17 at 19:20
  • what if you call `user(null)` in `main()` - it should not cause any problem either – ZhongYu Jan 31 '17 at 19:29
  • By the way, I am wondering in what circumstances do you want to remove classes by hands? In Java 8, the `Metaspace` will be cleared by gc automatically when it reaches `MaxMetaspaceSize` limitation, so maybe you don't need to worry about the dead classes by yourself. – Eric Feb 01 '17 at 03:43
  • 2
    @Eric Wang: I don’t think that the Metaspace is a concern here. Classes don’t consume space if they aren’t loaded and not being loaded is a prerequisite if you want to remove their files. I think, this is about optional features that might require an optional library that doesn’t need to be present if you don’t need/use the feature. – Holger Feb 01 '17 at 08:21
  • @Holger There might be a case that, the class is loaded first, then cleared from Metaspace, then you can also remove the class file from classpath, that said, still I still can't think of a real world requirement that not to put a class file in classpath just because the feature is not used. – Eric Feb 01 '17 at 12:20
  • 1
    @Eric Wang: classes can only be unloaded if their class loader becomes unreachable, but a each loaded class has a reference to its loader. So, since ordinary, non-reflective references from one class to another are resolved through the loader of the class containing the reference, it is impossible for such classes to become unloaded, unless the class containing the reference becomes unreachable too. It’s correct that the Metaspace will be cleared by gc, but that’s only relevant to classes loaded or generated through a custom class loader, which can get out of scope. – Holger Feb 01 '17 at 12:37
  • @Holger I am not sure about the part in the above comment `classes can only be unloaded if their class loader becomes unreachable`, can you point out which document or resource explained about this feature, thanks. – Eric Feb 01 '17 at 13:27
  • 2
    @Eric Wang: See [JLS §12.7](https://docs.oracle.com/javase/specs/jls/se8/html/jls-12.html#jls-12.7): “*Since we can never guarantee that unloading a class or interface whose loader is potentially reachable will not cause reloading, and reloading is never transparent, but unloading must be transparent, it follows that one must not unload a class or interface while its loader is potentially reachable. A similar line of reasoning can be used to deduce that classes and interfaces loaded by the bootstrap loader can never be unloaded.*” – Holger Feb 01 '17 at 14:55

3 Answers3

9

You need only small changes to your test code to answer that question.

Change the type hierarchy to

class Bar {}
class Foo extends Bar {}

and the program to

public class Test {
    public static void main(String[] args) {
        if (Arrays.asList(args).contains("--withFoo")) {
            use(new Foo());
        }
    }
    static void use(Bar foo) {
        // don't need actual code
    }
}

Now, the program will fail with an error, if Foo is absent, even before entering the main method (with HotSpot). The reason is that the verifier needs the definition of Foo to check whether passing it to a method expecting Bar is valid.

HotSpot takes a short-cut, not loading the type, if the types are an exact match or if the target type is java.lang.Object, where the assignment is always valid. That's why your original code does not throw early when Foo is absent.

The bottom line is that the exact point of time when an error is thrown is implementation dependent, e.g. might depend on the actual verifier implementation. All that is guaranteed is, as you already cited, that an attempt to perform an action that requires linkage will throw previously detected linkage errors. But it is perfectly possible that your program never gets so far to make an attempt.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • Most instructive example, thanks! What would you recommend to isolate unused code more reliably? Can I rely on bytecode (i.e. method implementations) being linked only if the containing class is initialized? – meriton Feb 01 '17 at 16:10
  • 2
    If you want to be really independent of the JVM implementation, you should follow Chris Parker’s answer. For Oracle’s JVM (HotSpot or OpenJDK), the verification is performed on a per-method basis, so references in methods which are never invoked have no effect. But that’s this particular JVM. – Holger Feb 01 '17 at 16:42
4

I guess something like this is undefined (sort of, see at the bottom). We know how it works for the oracle VM, but it's an implementation detail of the VM. A VM could also choose to load all classes right away.

Which you can find in the VM spec (emphasis mine):

Linking a class or interface involves verifying and preparing that class or interface, its direct superclass, its direct superinterfaces, and its element type (if it is an array type), if necessary. Resolution of symbolic references in the class or interface is an optional part of linking.

This specification allows an implementation flexibility as to when linking activities (and, because of recursion, loading) take place...

And further down:

The Java Virtual Machine instructions anewarray, checkcast, getfield, getstatic, instanceof, invokedynamic, invokeinterface, invokespecial, invokestatic, invokevirtual, ldc, ldc_w, multianewarray, new, putfield, and putstatic make symbolic references to the run-time constant pool. Execution of any of these instructions requires resolution of its symbolic reference.

Resolution is the process of dynamically determining concrete values from symbolic references in the run-time constant pool.

the line use(new Foo()); compiles to:

14: new           #5                  // class Foo
17: dup
18: invokespecial #6                  // Method Foo."<init>":()V
21: invokestatic  #7                  // Method use:(LFoo;)V

So these would require the resolution of Foo, but nothing else in the program will.


However, it also states (appended to an example, which is why I missed it at first):

Whichever strategy is followed, any error detected during resolution must be thrown at a point in the program that (directly or indirectly) uses a symbolic reference to the class or interface.

So while an error may be found with resolution when the Test class is loaded, the error will only be thrown when the faulty symbolic reference is actually used.

Community
  • 1
  • 1
Jorn Vernee
  • 31,735
  • 4
  • 76
  • 93
  • 1
    It can't be undefined, or unpredictable. Many applications are running without all static dependencies on the classpath. – ZhongYu Jan 31 '17 at 19:42
  • @ZhongYu Then those applications are relying on the VM implementation, that's nothing new (think of `UnSafe`, or the `useLegacyMergeSort` flag). – Jorn Vernee Jan 31 '17 at 19:44
  • If a JVM throws error on a class that's never been reached in a program, that is a violation of Java semantics. – ZhongYu Jan 31 '17 at 19:52
  • 1
    @ZhongYu Ok. I'd say, find a spec reference, and post it as an answer. – Jorn Vernee Jan 31 '17 at 19:53
  • OP's quote of JLS - *errors detected during linkage are thrown at a point in the program where some action is taken by the program that might require linkage to the class or interface involved in the error* – ZhongYu Jan 31 '17 at 19:58
  • @ZhongYu Right, but when is _linkage required_? That is apparently something that is decided by the VM. For instance at the point when the `Test` class is loaded, as the VM spec allows for it. – Jorn Vernee Jan 31 '17 at 20:25
  • My interpretation is that "require" is a strong word here; linkage is only "required" before class initialization. – ZhongYu Jan 31 '17 at 21:32
  • @ZhongYu I guess you were right after all. I found a spec reference that supports your claim. – Jorn Vernee Jan 31 '17 at 21:53
  • @JornVernee Linkage is required to have happened when a class is referenced, that's all that is known for certain. – biziclop Jan 31 '17 at 23:50
  • 2
    @Zhong Yu: the JVMS version of that statement contains the clarifying phrase “(directly or indirectly)”. Even loading the class `Test` is an action that indirectly refers to class `Foo`. – Holger Feb 01 '17 at 09:01
2

I have to say that in your circumstances, I'd be sorely tempted to use reflection to create an interface that is always present to bypass the issue entirely. Something along the lines of:

// This may or may not be present
package path.to.foo;
public class Foo implements IFoo {
    public void doFooStuff() {
        ...
    }
}

// This is always present
package path.to.my.code;
public interface IFoo {
    public void doFooStuff();
}

// Foo may or may not be present at runtime, but this always compiles
package path.to.my.code;
public class Test {

    public static void main(String[] args) {
        if (Arrays.asList(args).contains("--withFoo")) {
            Class<IFoo> fc = Class.forName("path.to.foo.Foo");
            IFoo foo = (IFoo)fc.newInstance();
            use(foo);
        }
    }

    static void use(IFoo foo) {
        // do something with foo
    }
}

[EDIT] I know this doesn't directly answer the question, but this seems like a better solution than where you are travelling.

Chris Parker
  • 416
  • 2
  • 9