5

I know that one of the criteria that Java HotSpot uses to decide whether a method is worth inlining is how large it the method is. On one hand, this seems sensible: if the method is large, in-lining leads to code bloat and the method would take so long to execute that the call overhead is trivial. The trouble with this logic is that it might turn out that AFTER you decide to inline, it becomes clear that for this particular call-site, most of the method is dead code. For instance, the method may be a giant switch statement, but most call sites call the method with a compile-time constant, so that actually: in-lining is cheap (don't need whole method body; minimal code bloat) and effective (method call overhead dominates actual work done).

Does HotSpot have any mechanism to take advantage of such situations and inline the method anyway, or is there a limit beyond which it refuses to even consider inlining a method, even though it would have a minimal code bloat effect?

Mark VY
  • 1,489
  • 16
  • 31
  • Why do you consider it at all? For what purpose? – lexicore May 02 '18 at 19:04
  • The overhead of a function call is very small and not worth thinking about, especially in Java were everything is passed by reference. – ventsyv May 02 '18 at 19:05
  • lexicore: I originally had a bunch of "motivational" text, but removed it because it was taking me forever to write, and because I worried that if the question were too long people would not read it. But consider something like java.util.Calendar.get. That method is implemented using an array, but one could imagine cases where an array is too much overhead and so one could use a switch statement instead. – Mark VY May 02 '18 at 19:08
  • 1
    ventsyv: I must disagree here. Inlining is probably among the most important optimizations that compilers and JITs perform; partly for it's own sake, and partly as the enabler of other optimizations. Java would run easily ten times slower if you disabled in-lining completely. (To see why, imagine changing an array into an ArrayList: the reason you can afford to do this without paying a big penalty is that most of the overhead is in-lined away.) – Mark VY May 02 '18 at 19:13
  • What do you mean by "too much overhead"? If I see it correctly, array `fields` has size `17`, so memory shouldn't be an issue. And with regards to access time, there is hardly anything faster than an array. If we also consider caching, it is highly likely that a single read to one of the array fields (given that arrays are memory-aligned) will most likely cache the whole array, making successive accesses even faster. – Turing85 May 02 '18 at 19:17
  • Imagine if instead of Calender I wanted a Point with an X and Y coordinate. I could implement it as `class Point{int x,y;}`, or as `class Point{int[] coordinates = new int[2];}`. The memory overhead of the second solution is relatively high. – Mark VY May 02 '18 at 19:19
  • Yeah. That is why we normally do not do this and rather have named attributes like `x` and `y`. Your point? – Turing85 May 02 '18 at 19:21
  • Right. So I have my fields x and y, but still want to support the syntax `myPoint.get(X)`. This can be super-handy if I want to loop over the coordinates, for instance. So I would use a switch statement. Would the method containing the switch statement get in-lined? In this case probably yes, since there are only two fields so the method should be pretty small. But what about in general? – Mark VY May 02 '18 at 19:24
  • My previous comment had a typo: I meant "loop over the coordinates", not "loop over the points"; fixed now. – Mark VY May 02 '18 at 19:28
  • "*[...] but still want to support the syntax `myPoint.get(X)` [...]*" - I presume that `X` is a `String` or an `Enum`. This would mean you must at least do a `String` comparison or are limited to the possible options. I fail to see where `myPoint.get(X)` is "more handy" than `myPoint.getX()`. If you want to loop over something, implement the `Iterable` interface. – Turing85 May 02 '18 at 19:30
  • I meant like this: `for(int k=0; k < Point.NUM_COORDS; k++) doSomethingWith(myPoint.get(k);`; – Mark VY May 02 '18 at 19:35
  • And when your `class Point implements Iterable`, you can just wirte `Point point = new Point(...); for (int coord : point) { ... }` – Turing85 May 02 '18 at 19:40
  • Sure, you could do that. But this just pushes the question around. How do you implement next()? You will probably want to use a switch statement! (Or an if-else chain; whatever.) – Mark VY May 02 '18 at 20:13
  • @MarkVY Please use `@username` in comments to get people notified of your replies. – lexicore May 02 '18 at 22:10
  • @lexicore: noted – Mark VY May 02 '18 at 22:55
  • @ventsyv: see my earlier comment; sorry for wrong syntax earlier – Mark VY May 02 '18 at 22:55

1 Answers1

8

HotSpot JIT inlining policy is rather complicated. It involves many heuristics like caller method size, callee method size, IR node count, inlining depth, invocation count, call site count etc.

There are some hard limits that prevent a large method from inlining, including:

  • -XX:FreqInlineSize=325 - the maximum size in bytecodes of the callee to be inlined;
  • -XX:InlineSmallCode=2000 - do not inline the callee if it already has a compiled code of at least this size in bytes;
  • -XX:NodeCountInliningCutoff=18000 - stop inlining if parser generates this number of IR nodes;
  • -XX:DesiredMethodLimit=8000 - the maximum size in bytecodes of aggregate method after inlining. This parameter is not tunable in product builds of HotSpot, but the limit can be switched off with -XX:-ClipInlining.

There are also other limits, but as you already see, a large method does not have much chance to be inlined, even though -XX:+IncrementalInline is enabled by default.

apangin
  • 92,924
  • 10
  • 193
  • 247
  • DesiredMethodLimit sounds very interesting. Does it take dead code into account? Because if so, that sounds like exactly the kind of crystal ball I was hoping existed. – Mark VY May 02 '18 at 22:57
  • 1
    @MarkVY Most of compiler optimizations, including dead code elimination, work on IR graph. Since `DesiredMethodLimit` counts bytecodes, it can't take IR-level optimizations into account. BTW, this parameter is compile-time constant. I've updated the answer. – apangin May 02 '18 at 23:47