1

Simple question asked mostly out of curiosity about what java compiler's are smart enough to do. I know not all compilers are built equally, but I'm wondering if others feel it's reasonable to expect an optimization on most compilers I'm likely to run against, not if it works on a specific version or on all versions.

So lets say that I have some tree structure and I want to collect all the descendant of a node. There are two easy ways to do this recursively.

The more natural method, for me, to do this would be something like this:

public Set<Node> getDescendants(){

   Set<Node> descendants=new HashSet<Node>();
   descendants.addall(getChildren());

   for(Node child: getChildren()){
      descendants.addall(child.getDescendants());
   }

   return descendants;
}

However, assuming no compiler optimizations and a decent sized tree this could get rather expensive. On each recursive call I create and fully populate a set, only to return that set up the stack so the calling method can add the contents of my returning set to it's version of the descendants set, discarding the version that was just built and populated in the recursive call.

So now I'm creating many sets just to have them be discarded as soon as I return their contents. Not only do I pay a minor initialization cost for building the sets, but I also pay the more substantial cost of moving all the contents of one set into the larger set. In large trees most of my time is spent moving Nodes around in memory from set A to B. I think this even makes my algorithm O(n^2) instead of O(n) due to the time spent copying Nodes; though it may work out to being O(N log(n)) if I set down to do the math.

I could instead have a simple getDescendants method that calls a helper method that looks like this:

public Set<Node> getDescendants(){
    Set<node> descendants=new HashSet<Node>();
    getDescendantsHelper(descendants);   

    return descendants;
}

public Set<Node> getDescendantsHelper(Set<Node> descendants){

   descendants.addall(getChildren());

   for(Node child: getChildren()){
      child.getDescendantsHelper(descendant);
   }

   return nodes;
}

This ensures that I only ever create one set and I don't have to waste time copying from one set to another. However, it requires writing two methods instead of one and generally feels a little more cumbersome.

The question is, do I need to do option two if I'm worried about optimizing this sort of method? or can I reasonably expect the java compiler, or JIT, to recognize that I am only creating temporary sets for convenience of returning to the calling method and avoid the wasteful copying between sets?

edit: cleaned up bad copy paste job which lead to my sample method adding everything twice. You know something is bad when your 'optimized' code is slower then your regular code.

dsollen
  • 6,046
  • 6
  • 43
  • 84
  • 1
    I would do it the second way and not worry about it seeming cumbersome. From my experience, getting a compiler to figure out how to optimize the first one is _way_ too much to ask; it seems to require near-human reasoning ability. Maybe the technology has advanced enough to do something like this, but I doubt it. – ajb Apr 17 '14 at 23:36
  • I've never seen a compiler that would do that, and the Java compiler does very little in the way of bytecode optimization. Code it yourself. – user207421 Apr 18 '14 at 00:15
  • Related read: http://stackoverflow.com/questions/3616483/why-does-the-jvm-still-not-support-tail-call-optimization (BTW, Scala can do tail recursion - by compiling recursive methods into loops sometimes; and it runs in the JVM.) – vanza Apr 18 '14 at 02:31

1 Answers1

1

The question is, do I need to do option two if I'm worried about optimizing this sort of method?

Definitely yes. If performance is a concern (and most of the time it is not!), then you need it.

The compiler optimizes a lot but on a very different scale. Basically, it works with one method only and it optimizes the most commonly used path there in. Due to heavy inlining it can sort of optimize across method calls, but nothing like the above.

It can also optimize away needless allocations, but only in very simple cases. Maybe something like

int sum(int... a) {
    int result = 0;
    for (int x : a) result += x;
    return result;
}

Calling sum(1, 2, 3) means allocating int[3] for the varargs arguments and this can be eliminated (if the compiler really does it is a different question). It can even find out that the result is a constant (which I doubt it really does). If the result doesn't get used, it can perform dead code elimination (this happens rather often).

Your example involves allocating a whole HashMap and all its entries, and is several orders of magnitude more complicated. The compiler has no idea how a HashMap works and it can't find out e.g., that after m.addAll(m1) the set m contains all member of m1. No way.

This is an algorithmical optimization rather than low-level. That's what humans are still needed for.

For things the compiler could do (but currently fails to), see e.g. these questions of mine concerning associativity and bounds checks.

Community
  • 1
  • 1
maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • Why do you feel the compiler couldn't figure out addAll? True it'd be a rather complicated, but there's no technical reason to state it's impossible. From a practical point of view the most likely optimization would involve inlining the recursive call several times and then doing escape analysis on the intermittent hash sets. Only a constant factor improvement I think but still – Voo Apr 20 '14 at 23:36
  • @Voo: There are a lot of things happening, even a single `insert` involves a `Map.Entry` allocation, finding the proper slot, possibly appending it to the chain, and so on and so on. The compiler sees no "big picture" as humans do, so it can't tell itself "it's just a temporary map and the insertion order is irrelevant, so let's eliminate it all". – maaartinus Apr 21 '14 at 12:55