16

There is a wealth of information and tutorials online regarding streams in Java 8. Most of what I have found does a good job of explaining the how the various elements of a stream work on a conceptual level. However, I have not encountered much material which describes how streams are actually implemented and executed by the JVM under the hood.

Consider comparing an operation on a Collection between using a stream and doing it the old school pre-Java 8 way. Would the underlying Bytecodes look the same between the two approaches? Would the performance be the same?

To make this concrete, consider the following example, where I have a need to find all fish whose name contains the word "fish", and then capitalize the first letter of each matching fish. (Yes, I know that Hagfish isn't really a fish, but I ran out of matching fish names.)

List<String> fishList = Arrays.asList("catfish", "hagfish", "salmon", "tuna", "blowfish");

// Pre Java-8 solution
List<String> hasFishList = new ArrayList<String>();

for (String fish : fishList) {
    if (fish.contains("fish")) {
        String fishCap = fish.substring(0, 1).toUpperCase() + fish.substring(1); 
        hasFishList.add(fishCap);
    }
}

// Java-8 solution using streams
List<String> hasFishList = fishList.stream()
    .filter(f -> f.contains("fish"))
    .map(f -> f.substring(0, 1).toUpperCase() + f.substring(1))
    .collect(Collectors.toList());

Any insight you might have into how these two approaches might differ under the hood, at the Bytecode level, would be great. And some actual byte code would be even better.

henry
  • 5,923
  • 29
  • 46
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • 8
    Streams don't need any special bytecode treatment. They're just a pure Java API: classes, methods, fields, as any other API. Lambdas are where special handlings in the compiler and in the VM were needed. Why don't you use javap to find out what the bytecode is? – JB Nizet Aug 23 '15 at 11:38
  • User @TigerValeev told me that the bytecodes were very, very different. – Tim Biegeleisen Aug 23 '15 at 11:46
  • 2
    Lambdas can be understood as a simple `new FunctionalInterface() { public X method(...) {...your lambda code...}}`. The real thing is more complicated, but not essentially so -- it's just about giving more flexibility to the actual implementation. What Tagir told you pertains to that detail and has nothing to do with the Streams API in particular. – Marko Topolnik Aug 23 '15 at 11:46
  • 2
    The bytecode is different because one uses loops and ifs, whereas the other ones uses an API taking predicates, functions and collectors, and using a pipeline. The two don't work at the same abstraction level. But streams don't come with a new set of byte-code instructions. They're just regular classes. – JB Nizet Aug 23 '15 at 11:56
  • If you think that my question insinuated that there are new bytecode instructions for streams, then either I asked wrongly, or you misinterpreted what I wrote. – Tim Biegeleisen Aug 23 '15 at 12:04
  • 1
    I was wondering whether the JDK first converts a lambda expression into something which is _not_ functional, and then generates bytecode based on that, or whether the JDK actually can "think" functionally. Based on the comments above, it appears the latter is the case. – Tim Biegeleisen Aug 23 '15 at 12:05
  • 3
    So, you're not asking about streams, but about lambda expressions. Basically, the body of the lambda expression is compiled to a private method, and an invokedynamic instruction is used to dynamically create a class that implements the functional interface and invoke the private method. invoke dynamic has been chosen in order to be able to implement something more efficient later without changing the bytecode generated by the compiler. See http://cr.openjdk.java.net/~briangoetz/lambda/lambda-translation.html for example. Google for invokedynamic, lambda, not for streams. – JB Nizet Aug 23 '15 at 12:21
  • 1
    `JDK first converts a lambda expression into something which is not functional`-- the basic semantics of lambda as a shorthand implementation of a functional interface is all you need to understand at this level of description. – Marko Topolnik Aug 23 '15 at 12:28
  • @MarkoTopolnik Could you post an answer with the source for your quote, along with an example perhaps? – Tim Biegeleisen Aug 23 '15 at 12:30
  • 3
    The source for my quote is right here--those are your words :-) But I'd be glad to help you with an answer when I'm at a real computer. – Marko Topolnik Aug 23 '15 at 12:34
  • So you want to know [How will Java lambda functions be compiled](http://stackoverflow.com/q/16827262/2711488)? Decide by yourself whether this is a duplicate… – Holger Aug 24 '15 at 09:33
  • Since you seem to be interested in the lambdas and bytecode, I expanded my answer to cover this problem too – Mifeet Aug 24 '15 at 22:19

1 Answers1

27

The answer has grown quite a bit over time so I will start with a summary:

Observations

  • Trace of what streams API really executes looks scary at first sight. Many calls and object creations. Notice, however, that the only part repeated for all elements in the collection is the body of the do-while loop. So apart from some constant overhead, the per-element overhead are ~6 virtual method calls (invokeinterface instructions - our 2 lambdas and 4 accept() calls on sinks).
  • The lambdas given to streams API calls are translated to a static method containing implementation and an invokedynamic instruction. Rather than creating a new object, it gives a prescription how to create the lambda at runtime. There is nothing special about calling the lambda methods on the created lambda objects afterwards (invokeinterface instruction).
  • You can observe how the stream is evaluated lazily. filter() and map() wrap their operations in anonymous subclasses of StatelessOp which in turn extend ReferencePipeline, AbstractPipeline and ultimately BaseStream. The actual evaluation is done when executing collect().
  • You can see how streams really use Spliterator rather then Iterator. Notice the many forks checking isParallel() - the parallel branches would leverage Spliterator's methods.
  • There are quite a few new objects created, at least 13. If you call such code in a loop, you may run into garbage collection problems. For a single execution it should be fine.
  • I would like to see a benchmark comparison of the two version. Streams version will probably be slower with difference from "Java 7 version" decreasing with increasing number of fish. See also a related SO question.

Trace Through Execution of Streams Usage in the Example

The pseudocode below captures a trace through the execution of the version using streams. See the bottom of this post for explanation how to read the trace.

Stream stream1 = fishList.stream();
    // Collection#stream():
    Spliterator spliterator = fishList.spliterator();
        return Spliterators.spliterator(fishList.a, 0);
            return new ArraySpliterator(fishList, 0);
    return StreamSupport.stream(spliterator, false)
        return new ReferencePipeline.Head(spliterator, StreamOpFlag.fromCharacteristics(spliterator), false)
Predicate fishPredicate = /* new lambda f -> f.contains("fish") */
Stream stream2 = stream1.filter(fishPredicate);
    return new StatelessOp(this, StreamShape.REFERENCE, StreamOpFlag.NOT_SIZED) { /* ... */ }
Function fishFunction = /* new lambda f.substring(0, 1).toUpperCase() + f.substring(1) */
Stream stream3 = stream2.map(fishFunction);
    return new StatelessOp(this, StreamShape.REFERENCE, StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT) { /* ... */ }
Collector collector = Collectors.toList();
    Supplier supplier = /* new lambda */
    BiConsumer accumulator = /* new lambda */
    BinaryOperator combiner = /* new lambda */
    return new CollectorImpl<>(supplier, accumulator, combiner, CH_ID);
List hasFishList = stream3.collect(collector)
    // ReferencePipeline#StatelessOp#collect(Collector):
    List container;
    if (stream3.isParallel() && /* not executed */) { /* not executed */ }
    else {
    /*>*/TerminalOp terminalOp = ReduceOps.makeRef(collector)
            Supplier supplier = Objects.requireNonNull(collector).supplier();
            BiConsumer accumulator = collector.accumulator();
            BinaryOperator combiner = collector.combiner();
            return new ReduceOp(StreamShape.REFERENCE) { /* ... */ }
    /*>*/container = stream3.evaluate(terminalOp);
            // AbstractPipeline#evaluate(TerminalOp):
            if (linkedOrConsumed) { /* not executed */ }
            linkedOrConsumed = true;
            if (isParallel()) { /* not executed */ }
            else {
            /*>*/Spliterator spliterator2 = sourceSpliterator(terminalOp.getOpFlags())
                    // AbstractPipeline#sourceSpliterator(int):
                    if (sourceStage.sourceSpliterator != null) { /* not executed */ }
                    /* ... */
                    if (isParallel()) { /* not executed */ }
                    return spliterator;
            /*>*/terminalOp.evaluateSequential(stream3, spliterator2);
                    // ReduceOps#ReduceOp#evaluateSequential(PipelineHelper, Spliterator):
                    ReducingSink sink = terminalOp.makeSink()
                        return new ReducingSink()
                    Sink sink = terminalOp.wrapAndCopyInto(sink, spliterator)
                        Sink wrappedSink = wrapSink(sink)
                            // AbstractPipeline#wrapSink(Sink)
                            for (/* executed twice */) { p.opWrapSink(p.previousStage.combinedFlags, sink) }
                                return new Sink.ChainedReference(sink)
                        terminalOp.copyInto(wrappedSink, spliterator);
                            // AbstractPipeline#copyInto()
                            if (!StreamOpFlag.SHORT_CIRCUIT.isKnown(getStreamAndOpFlags())) {
                            /*>*/wrappedSink.begin(spliterator.getExactSizeIfKnown());
                            /*>*/ /* not important */
                            /*>*/supplier.get() // initializes ArrayList
                            /*>*/spliterator.forEachRemaining(wrappedSink)
                                    // Spliterators#ArraySpliterator#foreachRemaining(Consumer):
                                    // ... unimportant code
!!                                  do {
                                    /*>*/action.accept((String)a[i])
                                    } while (++i < hi) // for each fish :)
                            /*>*/wrappedSink.end() // no-op
                            } else { /* not executed */}
                        return sink;
                    return sink.get()
            }
    /*>*/if (collector.characteristics().contains(Collector.Characteristics.IDENTITY_FINISH)) { return container; }
    /*>*/else { /* not executed */ }

The exclamation marks point to the actual workhorse: a do-while loop in fishList's Spliterator. Here is more detailed trace of the do-while loop:

do {
/*>*/action.accept((String)a[i])
    if (predicate.test(u)) { downstream.accept(u); }  // predicate is our fishPredicate
        downstream.accept(mapper.apply(u)); // mapper is our fishFunction
            accumulator.accept(u)
                // calls add(u) on resulting ArrayList
} while (++i < hi) // for each fish :)

Streams API with Lambdas on Bytecode Level

Let's look at how the relevant parts of the executed code look like in the bytecode. The interesting part is how

fishList.stream().filter(f -> f.contains("fish")).map(f -> f.substring(0, 1).toUpperCase() + f.ubstring(1)).collect(Collectors.toList());

is translated. You can find the full version on pastebin. I will focus only on
filter(f -> f.contains("fish")) here:

invokedynamic #26,  0         // InvokeDynamic #0:test:()Ljava/util/function/Predicate; [
    java/lang/invoke/LambdaMetafactory.metafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;
    (Ljava/lang/Object;)Z, 
    FishTest.lambda$fish8$0(Ljava/lang/String;)Z, 
    (Ljava/lang/String;)Z
  ]
invokeinterface #27,  2       // InterfaceMethod java/util/stream/Stream.filter:(Ljava/util/function/Predicate;)Ljava/util/stream/Stream;
  

There is nothing specific to streams API there, but the new invokedynamic instruction is used to create lambdas. The Java 7 equivalent of lambdas would be creation of anonymous inner class implementing Predicate. This would be translated to bytecode as:

new FishTest$1                        // create new instance of Predicate
dup
invokespecial FishTest$1.<init>()V    // call constructor

Creating a lambda in Java 8 is translated as a single invokedynamic instruction instead, without creation of a new object. The purpose of invokedynamic instruction is to defer creation of lambda to runtime (as opposed to compile time). This enables features like caching lambda instances:

The use of invokedynamic lets us defer the selection of a translation strategy until run time. The runtime implementation is free to select a strategy dynamically to evaluate the lambda expression. ... The invokedynamic mechanics allow this to be done without the performance costs that this late binding approach might otherwise impose. ... For example, ... we generate the class the first time a given lambda factory site is called. Thereafter, future calls to that lambda factory site will re-use the class generated on the first call.

Arguments of invokedynamic give a "recipe" for constructing an instance of the respective functional interface. They represent the metafactory for runtime instance creation, reference to the method it implements (i.e. Predicate.test()) and implementation of the method.
In our case, the implementation is invocation of static method boolean lambda$fish8$0(String) which the compiler sneaks in to our class. It contains the actual bytecode for f.contains("fish"). If you used lambda capturing method references (e.g. list::add), captured variables from the outer scope, etc., things would get more complex - look for occurences of "indy" in this document for more info.

The other parts of bytecode are less interesting. The do-while loop, apart from the obvious looping, contains an invokeinterface instruction calling accept() on the respective Consumer. accept() call propagates down the sinks, calling our lambdas along the way. Nothing special here, both lambda calls, and propagation through sinks are simple invokeinterface instructions.


How to read the pseudocode

Indentation is used to show the unfolded body of call above the indented code. Code beggining with /*>*/ represents continuation of the current call (when needed for better readability). Therefore call

Objects.requireNonNull(new Object());

would be written in the trace pseudocode as:

Object o = new Object(); // extracted variable to improve visibility of new instance creation
Objects.requireNonNull(o);
    // this is the body of Objects.requireNonNull():
    if (o == null) {
    /*>*/throw new NullPointerException(); // this line is still part of  requireNonNull() body
    }
    return o;

I also skipped some unimportant calls like null checks, omitted generic parameters, extracted inlined expressions to variables where appropriate, etc., to improve readability.

Community
  • 1
  • 1
Mifeet
  • 12,949
  • 5
  • 60
  • 108
  • Nice work... BTW getting started with JMH is far less work than this. – Marko Topolnik Aug 24 '15 at 06:46
  • I thought JMH was a benchmarking tool, can it create execution traces too? – Mifeet Aug 24 '15 at 08:17
  • You wondered about the difference in performance. Compared to the work you've done here, measuring performance would be far less work. – Marko Topolnik Aug 24 '15 at 08:33
  • Yep, that's right. The main objective of the answer was to analyze what's going on under the hood. The remarks about performance are just my side notes ;) – Mifeet Aug 24 '15 at 09:24
  • @Mifeet: A small remark about this: *"This enables features like caching lambda instances:"* This text seems to discuss lambda *class* caching, not lambda *instance* caching. – Lii Apr 21 '16 at 11:29