1

TL;DR: Given bytecode, how can I find out what classes and what methods get used in a given method?


In my code, I'd like to programmatically find all classes and methods having too generous access qualifiers. This should be done based on an analysis of inheritance, static usage and also hints I provide (e.g., using some home-brew annotation like @KeepPublic). As a special case, unused classes and methods will get found.

I just did something similar though much simpler, namely adding the final keyword to all classes where it makes sense (i.e., it's allowed and the class won't get proxied by e.g., Hibernate). I did it in the form of a test, which knows about classes to be ignored (e.g., entities) and complains about all needlessly non-final classes.

For all classes of mine, I want to find all methods and classes it uses. Concerning classes, there's this answer using ASM's Remapper. Concerning methods, I've found an answer proposing instrumentation, which isn't what I want just now. I'm also not looking for a tool like ucdetector which works with Eclipse AST. How can I inspect method bodies based on bytecode? I'd like to do it myself so I can programmatically eliminate unwanted warnings (which are plentiful with ucdetector when using Lombok).

Community
  • 1
  • 1
maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • Sounds like you're looking for a bytecode library, but [**questions asking us to recommend or find a** book, tool, **software library**, tutorial or other off-site resource **are off-topic for Stack Overflow** as they tend to attract opinionated answers and spam](http://stackoverflow.com/help/on-topic). – Andreas May 01 '17 at 22:00
  • @AndyThomas My last sentence is actually the question. I'm going to make it clear now. – maaartinus May 01 '17 at 22:07
  • 1
    common libraries that support reading of bytecode are asm (http://asm.ow2.org/) or bcel (https://commons.apache.org/proper/commons-bcel/) – k5_ May 01 '17 at 22:19
  • @maaartinus You're looking for a library that will help you read bytecode and/or analyze it. That means you're asking us to recommend a library. Did the [quoted text](http://stackoverflow.com/questions/43727089/reduce-visibility-of-classes-and-methods#comment74498037_43727089) from the [StackOverflow Help Center](http://stackoverflow.com/help) confuse you as to its meaning of an *off-topic* question? – Andreas May 01 '17 at 22:19
  • 2
    @Andreas No, I have known the text. My question is just "**Given bytecode, how can I find out what classes and what methods get used in a given method?**". There's no library mentioned anywhere. On the opposite: I wrote that I'm not looking for a tool like ucdetector. +++ We both know I'll need one, but that's not my problem. **I'm not asking for a library.** – maaartinus May 01 '17 at 22:24
  • So, you want to know how to find method calls in bytecode. Then, look at the bytecode (using a bytecode library, but you already have that?) and search for the method call instructions, since you did say *"I'd like to do it myself"*. Or **use a library**, e.g. as described in [this answer](http://stackoverflow.com/a/18233343/5221149). Oops, I inadvertently recommended a library, because that is what you're asking for. – Andreas May 01 '17 at 22:27
  • *FYI:* To me, the phrase *"Given bytecode"*, just reads as *"Given I have a `.class` file"*, because that is what a class file is: bytecode. It doesn't necessarily mean *"Given that I already use a library to read bytecode"*, because if you already had *programmatic* access to the bytecode, you already have the means to *programmatically* look at the bytecode and find the information you're looking for, making this question really confusing as to what your question is all about. – Andreas May 01 '17 at 22:37
  • @Andreas This was actually helpful... I hope, `visitMethodInsn` gives me *all* method calls, but what about classes? They may appear in many places and I can't see if I can get them all (Can I inspect the constant pool? Would it help?). +++ Agreed, in my question there's no difference between "bytecode" and "classfile". I choose the former without any specific reason. I just wanted to say that I want analyze neither the source code nor the AST. – maaartinus May 01 '17 at 22:41

1 Answers1

4

Looking at the usage on a per-method basis, i.e. by analyzing all instructions, has some pitfalls. Besides method invocations, there might be method references, which will be encoded using an invokedynamic instruction, having a handle to the target method in its bsm arguments. If the byte code hasn’t been generated from ordinary Java code (or stems from a future version), you have to be prepared to possibly encounter ldc instructions pointing to a handle which would yield a MethodHandle at runtime.

Since you already mentioned “analysis of inheritance”, I just want to point out the corner cases, i.e. for

package foo;

class A {
    public void method() {}
}
class B implements bar.If {
}

package bar;

public interface If {
    void method();
}

it’s easy to overlook that A.method() has to stay public.

If you stay conservative, i.e. when you can’t find out whether B instances will ever end up as targets of the If.method() invocations at other places in your application, you have to assume that it is possible, you won’t find much to optimize. I think that you need at least inlining of bridge methods and the synthetic inner/outer class accessors to identify unused members across inheritance relationships.

When it comes class references, there are indeed even more possibilities, to make a per-instruction analysis error prone. They may not only occur as owner of member access instructions, but also for new, checkcast, instanceof and array specific instructions, annotations, exception handlers and, even worse, within signatures which may occur at member references, annotations, local variable debugging hints, etc. The ldc instruction may refer to classes, producing a Class instance, which is actually used in ordinary Java code, e.g. for class literals, but as said, there’s also the theoretical possibility to produce MethodHandles which may refer to an owner class, but also have a signature bearing parameter types and a return type, or to produce a MethodType representing a signature.

You are better off analyzing the constant pool, however, that’s not offered by ASM. To be precise, a ClassReader has methods to access the pool, but they are actually not intended to be used by client code (as their documentation states). Even there, you have to be aware of pitfalls. Basically, the contents of a CONSTANT_Utf8_info bears a class or signature reference if a CONSTANT_Class_info resp. the descriptor index of a CONSTANT_NameAndType_info or a CONSTANT_MethodType_info points to it. However, declared members of a class have direct references to CONSTANT_Utf8_info pool entries to describe their signatures, see Methods and Fields. Likewise, annotations don’t follow the pattern and have direct references to CONSTANT_Utf8_info entries of the pool assigning a type or signature semantic to it, see enum_const_value and class_info_index

Holger
  • 285,553
  • 42
  • 434
  • 765
  • 1
    holly molly that's detailed...! – Eugene May 02 '17 at 16:31
  • Indeed a wonderful answer. Am I right assuming `asm` is the only tool for this job? It's damn low level. +++ Anyway, I'm not scared about missing a reference as this will be a tool for myself. When it goes wrong, I'll get a compile-time error, fix the code (git rulez) and fix the tool. – maaartinus May 03 '17 at 18:34
  • 1
    I didn’t evaluate other libraries. Afaik, Javassist offers a representation of the constant pool in its public API, but I don’t know how good it will serve the other purposes. If you think, ASM is “damn low level”, note that I usually don’t even use ASM as it is “too big” for some of my tasks (Instrumentation has to bee faaaaast). See for example [printing the pool](http://stackoverflow.com/a/32278587/2711488) or [finding class dependencies in the pool](http://stackoverflow.com/a/19928470/2711488) [or iterating an instruction sequence](http://stackoverflow.com/a/38058930/2711488) without ASM… – Holger May 03 '17 at 18:49
  • But there might be higher level libraries settling on such a lower level library like ASM, Javassist or BCEL, offering something in that direction of your actual task. But that’s more of a search-the-net rather than a programming task. – Holger May 03 '17 at 18:54