Why the performance for these java codes has so big difference?

Question

See this code:

private final Node head = new Node("");
private Node tail = head;

private Appendable append0(CharSequence csq) {
    charCount += csq.length();
    Node newNode = new Node(csq);
    tail.next = newNode;// 1!
    tail = newNode;// 2!
    return this;
}

In performance test, this code cost: 3055 ms

If I comment 1 and 2:

    Node newNode = new Node(csq);
    //tail.next = newNode;// 1!
    //tail = newNode;// 2!

It cost: 8 ms

If I comment 1:

    Node newNode = new Node(csq);
    //tail.next = newNode;// 1!
    tail = newNode;// 2!

It cost: 138 ms

If I comment 2:

    Node newNode = new Node(csq);
    tail.next = newNode;// 1!
    //tail = newNode;// 2!

It cost: 143 ms

If I swap 1 and 2:

    Node newNode = new Node(csq);
    tail = newNode;// 2!
    tail.next = newNode;// 1!

It cost: 136 ms

3055 vs 140 ?????

Why the performances so big different????

Here is the full code:

public class CharsAppender implements Appendable {

private int charCount = 0;
private final Node head = new Node("");
private Node tail = head;

@Override
public Appendable append(CharSequence csq) {
    return append0(csq);
}

@Override
public Appendable append(CharSequence csq, int start, int end) {
    CharSequence chars = csq == null ? BDefault.nullString() : csq;
    return append0(BString.subRef(chars, start, end));
}

@Override
public Appendable append(char c) {
    return append0(Character.toString(c));
}

private Appendable append0(CharSequence csq) {
    charCount += csq.length();
    Node newNode = new Node(csq);
    tail.next = newNode;
    tail = newNode;
    return this;
}

private static class Node {

    private final CharSequence value;
    private Node next;

    private Node(CharSequence value) {
        this.value = value;
    }

    public CharSequence getValue() {
        return value;
    }

    public Node getNext() {
        return next;
    }

    public void setNext(Node next) {
        this.next = next;
    }
}

public static void main(String[] args) {
    int times = 10000000;
    String message = "fasasgadfgdfgfd923fr";

    StringBuilder sb = new StringBuilder();
    long s1 = System.currentTimeMillis();
    for (int i = 0; i < times; i++) {
        sb.append(message);
    }
    System.out.println("StringBuilder cost: " + (System.currentTimeMillis() - s1));

    CharsAppender ca = new CharsAppender();
    long s2 = System.currentTimeMillis();
    for (int i = 0; i < times; i++) {
        ca.append(message);
    }
    System.out.println("CharsAppender cost: " + (System.currentTimeMillis() - s2));
}
}

The most likely explanation is that your benchmark is flawed. See [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/questions/504103). But either way, we can only explain what is going on if we can see (and run) the benchmark. So a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) is required. — Stephen C, May 05 '22 at 07:08
It may also be that one or more of the alternatives is incorrect code. A benchmark that measures incorrect code is ... not helpful. Write some unit tests or something to ensure that the implementation is correct before trying to benchmark it. (Optimizing broken code is a waste of time ...) — Stephen C, May 05 '22 at 07:11
Exactly. There is no point whatsoever in comparing execution times of incorrect code. You can get the wrong answer in zero time. — user207421, May 05 '22 at 07:17
This is not full code. Classes are missing. (I was going to run the code for myself to validate your results ...) — Stephen C, May 05 '22 at 22:39

Stephen C · Answer 1 · 2022-05-05T22:33:45.070

2

OK. You have both of the problems mentioned in the comments.

The code you are benchmarking is incorrect (in 3 out of 4 cases).
Your benchmarking technique is flawed. Please read How do I write a correct micro-benchmark in Java?.

I suspect that the actual difference in performance can be attributed to garbage collection. In one case you are building a well-formed linked list with 10,000,000 entries. In the other cases the list is broken and is effectively only one element long.

When the GC runs, it will attempt to find all reachable Node objects by traversing the list, and then copy them all to another "space". In the broken case, the traversal and copying takes minimal time. In the non-broken case, it has to traverse and copy (up to) 10,000,000 Node objects, possibly more than once (depending on what the heap size is).

In fact, it is plausible that the 8ms case the JIT compiler has optimized away the creation of the Node objects!

edited May 05 '22 at 22:33

answered May 05 '22 at 07:32

Stephen C

698,415
94
811
1,216

you mean to keep 10,000,000 objects alive is expensive ? – FredSuvn May 05 '22 at 07:55
Keeping 10,000,000 object is only expensive depending on the objects. It's a minimum of 10Megs. Keeping 10,000,000 nodes is not a lot. – matt May 05 '22 at 07:59
@matt maybe (keep objects alive is expensive) is right. If I first test `StrngBuilder` then `CharsAppender`, `sb` cost 400+ and `ca` cost 3000+. If I swap them, first test `CharsAppender` then `StringBuilder`, `sb` cost 800+ and `ca` cost 2000+. – FredSuvn May 05 '22 at 08:10
@FredSuvn This answer referring to garbage collection.When garbage collection runs it has to remove objects that are not accessible. Keeping 10000000 objects alive isn't expensive. – matt May 05 '22 at 08:46
@FredSuvn Aren't you comparing different versions of CharsAppender? Which one are you comparing in your new example? – matt May 05 '22 at 08:54
@matt JVM gc manages objects reachable or un-reachable to clean-up (in this processing, these objects may move to another space, or do other operation). All these operations make the final performance obviously lower, that's why I said it is expensive. – FredSuvn May 05 '22 at 18:06
It is also worth noting that you can get significant slowdown if the amount of physical RAM available is insufficient for the application. This can lead to thrashing. – Stephen C May 05 '22 at 22:37

matt · Answer 2 · 2022-05-05T07:11:42.883

I am going to assume you have a linked list, and this is the append method.

A linked list has a tail it is the last element in the list. The following two lines create a new node and put it on the end of the tail.

Node newNode = new Node(csq);
tail.next = newNode;// 1!

Once you've done that, tail is no longer the end of the list, your new node is. So you need to re-assign tail.

tail = newNode;// 2!

So hopefully that helps understand how the list works. Each element points to the "next" element until you get to the tail.

Now look at what happens when you change the code.

When you comment out both 1 & 2, then appending doesn't add anything to your list. There are no changes to tail or to next so you just create a node and forget about it. The charCount increases, so your list is effectively broken.

If you comment only line 1 then your list doesn't grow you just keep re-assigning the tail to the new character sequence. Your list is broken after this because the previous tail is not linked to the current node.

If you comment line 2 then your list doesn't grow, you just keep re-assigning the tail.next value to the new character sequence. This just replaces the last element of the list, but the charCount will be wrong.

If you swap the order of 1 and 2 then your list doesn't grow because you're replacing the tail with a new node and making tail.next the same node. I am surprised it doesn't cause an infinite loop.

Why the performance for these java codes has so big difference?

2 Answers2