Why does ArrayList seriously outperform LinkedList?

Question

I did some research and wrote the following article: http://www.heavyweightsoftware.com/blog/linkedlist-vs-arraylist/ and wanted to post a question here.

class ListPerformanceSpec extends Specification {
    def "Throwaway"() {
        given: "A Linked List"
        List<Integer> list
        List<Integer> results = new LinkedList<>()

        when: "Adding numbers"
        Random random = new Random()
        //test each list 100 times
        for (int ix = 0; ix < 100; ++ix) {
            list = new LinkedList<>()
            LocalDateTime start = LocalDateTime.now()

            for (int jx = 0; jx < 100000; ++jx) {
                list.add(random.nextInt())
            }

            LocalDateTime end = LocalDateTime.now()
            long diff = start.until(end, ChronoUnit.MILLIS)
            results.add(diff)
        }

        then: "Should be equal"
        true
    }

    def "Linked list"() {
        given: "A Linked List"
        List<Integer> list
        List<Integer> results = new LinkedList<>()

        when: "Adding numbers"
        Random random = new Random()
        //test each list 100 times
        for (int ix = 0; ix < 100; ++ix) {
            list = new LinkedList<>()
            LocalDateTime start = LocalDateTime.now()

            for (int jx = 0; jx < 100000; ++jx) {
                list.add(random.nextInt())
            }

            long total = 0

            for (int jx = 0; jx < 10000; ++jx) {
                for (Integer num : list) {
                    total += num
                }
                total = 0
            }

            LocalDateTime end = LocalDateTime.now()
            long diff = start.until(end, ChronoUnit.MILLIS)
            results.add(diff)
        }

        then: "Should be equal"
        System.out.println("Linked list:" + results.toString())
        true
    }

    def "Array list"() {
        given: "A Linked List"
        List<Integer> list
        List<Integer> results = new LinkedList<>()

        when: "Adding numbers"
        Random random = new Random()
        //test each list 100 times
        for (int ix = 0; ix < 100; ++ix) {
            list = new ArrayList<>()
            LocalDateTime start = LocalDateTime.now()

            for (int jx = 0; jx < 100000; ++jx) {
                list.add(random.nextInt())
            }

            long total = 0

            for (int jx = 0; jx < 10000; ++jx) {
                for (Integer num : list) {
                    total += num
                }
                total = 0
            }

            LocalDateTime end = LocalDateTime.now()
            long diff = start.until(end, ChronoUnit.MILLIS)
            results.add(diff)
        }

        then: "Should be equal"
        System.out.println("Array list:" + results.toString())
        true
    }

}

Why does ArrayList outperform LinkedList by 28% for sequential access when LinkedList should be faster?

My question is different from When to use LinkedList over ArrayList? because I'm not asking when to choose it, but why it's faster.

*"when LinkedList should be faster?"* based on what, exactly? — Federico klez Culloca, Apr 06 '18 at 12:05
@FedericoklezCulloca A linked list should perform better for sequential access. That's how it's designed. — Thom, Apr 06 '18 at 12:06
@FedericoklezCulloca Any book on data structures. Arrays require multiplication to generate a pointer. Linked Lists just de-reference a pointer. — Thom, Apr 06 '18 at 12:07
@Thom It's beneficial to used linked lists if all you need is sequential access since additions won't require reallocations of the entire array, but I've never heard that LL are faster than arrays for iteration. — Carcigenicate, Apr 06 '18 at 12:14
@Thom, the case where a linked list is faster is for insertions and removals, especially given a node (note: not an index) anywhere on the list. Sequential iteration shouldn't be much different, but an array list has some locality, whereas a linked list has no locality, so cache usage will differ a lot; for object references, you might not notice, as there will be no locality between the array list and the objects themselves. Random access (i.e. index based access) on a linked list is way worse than on an array list. — acelent, Apr 06 '18 at 12:23
@acelent Yes, netch made the cache issue in his answer and it's probably a good one. — Thom, Apr 06 '18 at 12:26
@Thom, also, a multiplication may take less cycles than a non-cached dereference. A multiplication by a power of 2 is just a shift, so in 32-bit, the CPU just shifts the index by 2 bits for a 4-byte alignment, and in 64-bit the CPU just shifts the index by 3 bits for an 8-byte alignment. — acelent, Apr 06 '18 at 12:27
This might be interesting: [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/q/504103/18157) — Jim Garrison, Apr 06 '18 at 12:48
Possible duplicate of [When to use LinkedList over ArrayList?](https://stackoverflow.com/questions/322715/when-to-use-linkedlist-over-arraylist) — yivi, Apr 07 '18 at 16:24

score 4 · Answer 1 · answered Apr 06 '18 at 12:14

4

Array-based lists, as Java ArrayList, use much less memory for the same data amount than link-based lists (LinkedList), and this memory is organized sequentially. This essentially decreases CPU cache trashing with side data. As soon as access to RAM requires 10-20 times more delay than L1/L2 cache access, this is causing sufficient time difference.

You can read more about these cache issues in books like this one, or similar resources.

OTOH, link-based lists outperform array-based ones in operation like insering to middle of list or deleting there.

For a solution that have both memory economy (so, fast iterating) and fast inserting/deleting, one should look at combined approaches, as in-memory B⁺-trees, or array of array lists with proportionally increased sizes.

answered Apr 06 '18 at 12:14

Netch

4,171
1
19
31

You're missing the fact that only the _references_ are stored in an array, the objects themselves are still on the heap, so if you're actually touching the objects then cache locality should be no different. Also, in a singly-linked list you have only one extra reference to store, so it's not _"much less memory for the same data amount"_ – Jim Garrison Apr 06 '18 at 12:46
@JimGarrison yep, this referencing reduces effects of compact storing, but still 3 references plus object head plus possible memory gaps are much more spending than 1 reference. And, concrete `LinkedList` in Java is double-linked, and topic starter compared with it, so, your referencing single-linked lists is misnomer. – Netch Apr 06 '18 at 12:49
The slowness of linked lists generally has very little to do with memory efficiency and a whole lot to do with data dependencies and memory latency. – Veedrac Apr 06 '18 at 15:15

score 3 · Answer 2 · answered Apr 06 '18 at 12:13

3

Why does ArrayList outperform LinkedList by 28% for sequential access when LinkedList should be faster?

You're assuming that, but don't provide anything to back it up. But it's not really a great surprise. An ArrayList has an array as the underlying data store. Accessing this sequentially is extremely fast, because you know exactly where every element is going to be. The only slowdown comes when the array grows beyond a certain size and needs to be expanded, but that can be optimised.

The real answer would probably be: check the Java source code, and compare the implementations of ArrayList and LinkedList.

answered Apr 06 '18 at 12:13

SeverityOne

2,476
12
25

Not an assumption. An array list should have to perform multiplication to generate a pointer for every sequential access. A linked list should only have to de-reference the pointer. Takes half as many instructions. – Thom Apr 06 '18 at 12:19
5

@Thom In your test you are adding objects to list. Linked list has to create new node for each entry, while ArrayList does not. – Piro Apr 06 '18 at 12:23
@Piro That may well be true, but would be a sad implementation. When I wrote a linked list in college, I allocated blocks at a time and then farmed them out. That may be both the point and my answer. – Thom Apr 06 '18 at 12:25
@Thom Without checking the actual implementation, repeating what you learned in college is pointless. What you're also forgetting is that `ArrayList` simply stores the vanilla object in an array, whereas `LinkedList` has to create a node object for each entry in the list. So that means *two* object creations per addition. Now, which is more expensive: multiplying once every so often, or creating tons of node objects? – SeverityOne Apr 06 '18 at 12:28
3

@Thom Your instruction counting may have been correct some 20 years ago. A 32-bit multiplication has 3 cycles latency on my oldish i5. A multiplication by 4 or 8 as needed for the addressing takes a single cycle as it's just a shift. A multiplication by 2, 4 or 8 is a part of the address calculation and needs no separate instruction. A cache miss takes tens of cycles. `LinkedList` has no memory locality and it has a rather high memory consumption, so many cache misses are guaranteed. Forget `LinkedList`, at least in Java. – maaartinus Apr 06 '18 at 21:00

Piro · Accepted Answer · 2018-04-06T12:45:41.360

From LinkedList source code:

/**
 * Appends the specified element to the end of this list.
 *
 * <p>This method is equivalent to {@link #addLast}.
 *
 * @param e element to be appended to this list
 * @return {@code true} (as specified by {@link Collection#add})
 */
public boolean add(E e) {
    linkLast(e);
    return true;
}

/**
 * Links e as last element.
 */
void linkLast(E e) {
    final Node<E> l = last;
    final Node<E> newNode = new Node<>(l, e, null);
    last = newNode;
    if (l == null)
        first = newNode;
    else
        l.next = newNode;
    size++;
    modCount++;
}

From ArrayList source code:

/**
 * Appends the specified element to the end of this list.
 *
 * @param e element to be appended to this list
 * @return <tt>true</tt> (as specified by {@link Collection#add})
 */
public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

 private void ensureExplicitCapacity(int minCapacity) {
    modCount++;

    // overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

So linked list has to create new node for each element added, while array list does not. ArrayList does not reallocate/resize for each new element, so most of time array list simply set object in array and increment size, while linked list does much more work.

You also commented:

When I wrote a linked list in college, I allocated blocks at a time and then farmed them out.

I do not think this would work in Java. You cannot do pointer tricks in Java, so you would have to allocate a lot of small arrays, or create empty nodes ahead. In both cases overhead would probably be a bit higher.

Yep. That implementation sucks. Thanks. – Thom Apr 06 '18 at 12:35 — Thom, Apr 06 '18 at 12:35

Stephen C · Answer 4 · 2018-04-06T23:37:18.347

One explanation is that your base assumption (that multiplication is slower than memory fetches) is questionable.

Based on this document, a AMD Bulldozer takes 1 clock cycles to perform a 64 bit integer multiply instruction (register x register) with 6 cycles of latency¹. By contrast, a memory to register load takes 1 clock cycle with 4 cycles of latency. But that assumes that you get a cache hit for the memory fetch. If you get a cache miss, you need to add a number of cycles. (20 clock cycles for an L2 cache miss, according to this source.)

Now that is just one architecture, and others will vary. And we also need to consider other issues, like constraints on the number of multiplications that can be overlapped, and how well the compiler can organize the instructions to get them minimize instruction dependencies. But the fact remains that for a typical modern pipelined chip architecture, the CPU can execute integer multiplies as fast as it can execute memory to register moves, and much faster if there are more cache misses in the memory fetches.

Your benchmark is using lists with 100,000 Integer elements. When you look at the amount of memory involved, and the relative locality of the heap nodes that represent the lists and the elements, the linked list case will use significantly more memory, and have correspondingly worse memory locality. That will lead to more cache misses per cycle of the inner loop, and worse performance.

Your benchmark results are not surprising² to me.

The other thing to note is that if you use Java LinkedList, a separate heap node is used to represent the list nodes. You can implement your own linked lists more efficiently if your element class has its own next field that can be used to chain the elements. However, brings its own limitations; e.g. an element can only be in one list at a time.

Finally, as @maaartinus points out, a full IMUL is not required in the case of a Java ArrayList. When reading or writing the ArrayList's array, the indexing multiplication will be either x 4 or x 8 and that can be performed by a MOV with one of the standard addressing modes; e.g.

MOV EAX, [EDX + EBX*4 + 8]

This multiplication can be done (at the hardware level) by shifting with much less latency than 64 bit IMUL.

^{1 - In this context, the latency is the number of cycles delay before the result of the instruction is available ... to the next instruction that depends on it. The trick is to order the instructions so that other work is done during the delay.}

^{2 - If anything, I am surprised that LinkedList appears to be doing so well. Maybe calling Random.nextInt() and autoboxing the result is dominating the loop times?}

OK, you have to multiply then do a fetch. It's still an extra operation. — Thom, Apr 06 '18 at 13:34
No. Because with a Java ArrayList versus LinkedList it is a IMUL and a MOV versus two MOVs. And the probabilities of cache misses are lower in the first case than the second. (And even in the case where you chained the elements, the difference in cache miss ratios might be sufficient to make IMUL + MOV **faster** that one MOV. But we now also need to consider things like cache line sizes, memory fetch widths. Or the compiler doing clever optimizations to avoid the IMUL, though that is more plausible for a really good C / C++ compiler rather than for a Java compiler.) — Stephen C, Apr 06 '18 at 13:59
There's no multiplication there (or am I blind???). The multiplication needed for addressing is just a shift and there's an addressing mode doing exactly this. — maaartinus, Apr 06 '18 at 21:03
@maaartinus - You could well be correct. I have been taking the OP's word for it that a multiplication is required. — Stephen C, Apr 06 '18 at 23:16

Why does ArrayList seriously outperform LinkedList?

4 Answers4