7

I have an unusual requirement: My application generates Java code automatically from a very long script (written in a dynamically-typed language). The script is so long that I hit the maximum method size of 65k of the JVM.

The script consists only of simple instructions on primitive types (no call to other functions besides mathematical ones). It may look like:

...
a = b * c + sin(d)
...
if a>10 then
   e = a * 2
else
   e = a * abs(b)
end
...

... which is transformed as:

...
double a = b * c + Math.sin(d);
...
double e;
if(a>10){
   e = a * 2;
}else{
   e = a * Math.abs(b);
}
...


My first idea to overcome the method size limitation was the following:
  • Turn all local variables into fields
  • Split the code every 100 lines (or longer if needed in case of a if/else block) in separate methods.

Something like:

class AutoGenerated {

   double a,b,c,d,e,....;

   void run1(){
      ...
      a = b * c + sin(d);
      ...
      run2();
   }

   void run2(){
      ...
      if(a>10){
          e = a * 2;
      }else{
          e = a * Math.abs(b);
      }
      ...
      run3();
   }

   ...
}

Do you know of any other way that would be more efficient? Note that I need the code to run as fast as possible as it will be executed in long loops. I cannot resort to compiling to C, as interoperability is also an issue...

I would also appreciate pointers to libraries that might help me.

Community
  • 1
  • 1
Eric Leibenguth
  • 4,167
  • 3
  • 24
  • 51
  • If you are worried about efficiency, you should note that methods over 8 KB in size are not compiled by default. – Peter Lawrey Aug 25 '15 at 10:12
  • I would consider what inlining can do for you. Are there any repeating sequences of code which after inlining would produce the same code? – Peter Lawrey Aug 25 '15 at 10:13
  • @PeterLawrey, What happens then over 8KB? Is the code interpreted, costing a lot of efficiency? What makes the compiler decide what it should compile or not? Regarding *inlining*, how would that work exactly? Should I look for "patterns" in the code and create dedicated methods to handle them? – Eric Leibenguth Aug 25 '15 at 10:25
  • If the method is not compiled it will be interpreted. By inlining I suggest writing some small methods for common functionality you use in the generated code. – Peter Lawrey Aug 25 '15 at 12:48
  • If I were you, I would seriously consider @OldCurmudgeon answer. You say it needs to run as fast as possible, but I would always question such an assumption, because it strongly depends of whatever else is happening in that long loop. If this code, even when interpreted, only accounts for 20%, say, of the total, then compiling it cannot save more than that, and it may well be that you have bigger "fish to fry" elsewhere. – Mike Dunlavey Aug 25 '15 at 13:06

4 Answers4

2

We are using the similar approach in one of the projects despite its disadvantages mentioned by other people. We call the multiple generated methods from single launcher method like @Marco13 suggests. We actually calculate (pretty precisely) the size of the generated bytecode and start a new method only when limit is reached. Our math formulas which we translate to Java code are available as AstTree and we have a special visitor which counts the bytecode length per each expression. For such simple programs it's quite stable across Java versions and different compilers. So we don't create methods more than necessary. In our case it's quite hard to emit the bytecode directly, but you can try to do it for your language using ASM or similar library (this way, of course, ASM will calculate the bytecode length for you).

We usually store the data variables in single double[] array (we don't need other types) and pass it as parameter. This way you don't need the enormous number of fields (sometimes we have thousands of variables). On the other hand local array access may take more bytecode bytes compared to field access for index higher than 127.

Another concern is the constant pool size. We usually have many double constants in the autogenerated code. If you declare many fields and/or methods, their names also take the constant pool entries. So it's possible to hit the class constant pool limit. Sometimes we hit it and generate nested classes to overcome this problem.

Other people suggest to tweak JVM options as well. Use these advices carefully as they will affect not only this autogenerated class, but every other class as well (I assume that other code is also executed in the same JVM in your case).

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
1

Converting the local variables into fields may actually have a negative impact on the performance as long as the code is not optimized by the JIT (see this question and related ones for further infos). But I see that, depending on the variables this involves, there may hardly be other feasible options.


There may be additional limits for the compilation and method sizes. Peter Lawrey mentioned in the comments that "...methods over 8 KB in size are not compiled by default" - I wasn't aware of this one, but he usually knows what he's talking about, so you should dig a bit deeper here. Additionally, wou might want to have a look at the HotSpot VM options to see which further limits and settings might be relevant for you. I primarily thought that

-XX:MaxInlineSize=35 : Maximum bytecode size of a method to be inlined.

might be something to keep in mind.

(In fact, calling so many methods with a size of MaxInlineSize that inlining all these calls will exceed the size of 65k bytes for the containing method may be a neatly-nasty test case for the robustness and edge case tests of the inlining process...)


You sketched a "telescoping" call scheme for the methods:

void run1(){
   ...
   run2();
}

void run2(){
   ...
   run3();
}

This may lead into problems as well: Considering that you have >650 of these methods (in the best case), this will at least lead to a very deep stack, and might in fact cause a StackOverflowError - again, depending on some of the Memory Options. You might have to increase the Stack Size by setting the -Xss parameter accordingly.


The actual problem description was a bit vague, and without further information about the code that is to be generated (also regarding questions about e.g. how many local variables you need, that might have to be turned into instance variables etc), I'd propose the following:

  • Create many small methods if possible (considering the MaxInlineSize)
  • Try to reuse these small methods (if such reusability can be detected from the input with reasonable effort)
  • Call these methods sequentially, as in

    void run()
    {
        run0();
        run1();
        ...
        run2000();
    }
    

    to avoid problems with the stack size.


However, if you added further examples or details, one could probably give more focussed advice. This could even be a "complete" example - not necessarily involving thousands of lines of codes, but showing the actual patterns that appear there.

Community
  • 1
  • 1
Marco13
  • 53,703
  • 9
  • 80
  • 159
  • Thanks for the answer, definitely helps! It is hard to provide a "representative sample" of the code, because it is potentially quite diverse. There are *some* recurring patterns at the instruction level (various lines looking alike), but not so much at the program level (blocks of lines looking alike). – Eric Leibenguth Aug 25 '15 at 11:20
0

I would be tempted to write an interpreter or perhaps an in-line compiler. You may even get some speed gains because most of the resultant much smaller code base will cache more easily.

OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
  • You mean to code an interpreter (in Java) interpreting the original language? How does an in-line compiler work and how do you write one? – Eric Leibenguth Aug 25 '15 at 10:29
  • @EricLeibenguth - An in-line compiler would read the language and would build a data struture that can then be executed directly - something like a state machine perhaps. This may be faster than an interpteter. – OldCurmudgeon Aug 25 '15 at 10:42
  • OK, I see what you mean, but that seems a bit complex to build myself. Perhaps you know of a library that might help? – Eric Leibenguth Aug 25 '15 at 10:55
  • @EricLeibenguth - You may find an [Expression Parser](https://www.google.ro/search?q=java+expression+parsin) a good start. [JEP](http://www.cse.msu.edu/SENS/Software/jep-2.23/doc/website/doc/doc_usage.htm) looks interesting. – OldCurmudgeon Aug 25 '15 at 12:55
0
  • Turn all local variables into fields

That won't have the slightest effect. Method size == code size. Nothing to do with local variables, which only affect the invocation frame size.

  • Split the code every 100 lines (or longer if needed in case of a if/else block) in separate methods.

This is your only choice, other than a completely different implementation strategy.

The problem with code generators is that they generate code.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • 1
    Turning local variables into fields was an enabler for splitting the method (I don't have to figure out what method1 should pass to method2, because all the variables are available as fields) – Eric Leibenguth Aug 25 '15 at 10:32