I'm looking for a bytecode optimizing algorithm that computes unnecessary calculations and removes dead local variables. Imagine this code:
iconst_2
iconst_3
iadd
invokestatic someMethod (I)V
It can easily be optimized to:
iconst_5
invokestatic someMethod (I)V
My first idea to implement this was to use a custom ASM interpreter that captures the instructions (like SourceInterpeter
) and automatically computes stack values. Then replace the last calculation with known outcome (in this case iadd
) with the real value, and iterate backwards to remove all captured instructions. This would work on many cases, but not on every case.
Here the optimization would be a bit harder:
iconst_2
iconst_3
dup
getstatic someUnknownInt I
iadd
invokestatic someOtherMethod (I)V
iadd
invokestatic someMethod (I)V
If iadd
is replaced by iconst_5
, and iconst_2
and iconst_3
get removed, then we would have invalid bytecode. My second thought was to do multiple passes. First, replace all dup
instructions with their real value as new push instruction (if it isn't a method / field reference), then remove all pop instructions (but don't remove methods), and finally compute calculations. This also doesn't work every time.
iconst_2
invokestatic someMethod ()I
dup2
invokestatic someOtherMethod (I)V
swap
invokestatic someOtherMethod (I)V
iadd //this can be computed -> iconst_4
invokestatic someOtherMethod (I)V
The last thing i tried was to use a SourceInterpreter
and rewrite the instructions. Only obligatory instructions (like method calls, monitorenter
, returns, etc.) get added, with the stack they take as rewritten code before them. This leads to invalid bytecode, as pop operations are not kept (after methods) and duplication of method calls (if the next obligatory instruction takes the previous method as argument). What is the best way to do this?