21

- What is the difference between LinkedList and ArrayList? When is it preferable to use a LinkedList?

I think every Java developer has heard this question at interview at least once.

- Linked list is preferable if you want to be able to insert items in the middle of the list.

It is a common answer to this question. Everybody knows it. Every time you ask a question about the difference between List implementations you get such answers as:

When should I use LinkedList? When do you need efficient removal in between elements or at the start?

From here

Forgot to mention insertion costs. In a LinkedList, once you have the correct position, insertion costs O(1), while in an ArrayList it goes up to O(n) - all elements past the insertion point must be moved.

From here

Linked lists are preferable over arrays when you want to be able to insert items in the middle of the list (such as a priority queue).

From here

ArrayList is slower because it needs to copy part of the array in order to remove the slot that has become free. LinkedList just has to manipulate a couple of references.

From here

And more...

But have you ever tried to reproduce it by yourself? I tried yesterday and got these results:

import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;

public class Test {
    public static void main(String... args) {
        final int MAX_VAL = 10000;
        List<Integer> linkedList = new LinkedList<Integer>();
        List<Integer> arrayList = new ArrayList<Integer>();
        for(int i = 0; i < MAX_VAL; i++) {
            linkedList.add(i);
            arrayList.add(i);
        }
        long time = System.nanoTime();
        for(int i = 0; i < MAX_VAL; i++) {
            linkedList.add(MAX_VAL/2, i);
        }
        System.out.println("LL time: " + (System.nanoTime() - time));
        time = System.nanoTime();
        for(int i = 0; i < MAX_VAL; i++) {
            arrayList.add(MAX_VAL/2, i);
        }
        System.out.println("AL time: " + (System.nanoTime() - time));
    }
}

Output:

LL time: 114098106

AL time: 24121889

So what is it? Why does LinkedList suck so much? Maybe we should try the remove operation instead of add? Ok, let's try:

import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;

public class Test {
    public static void main(String... args) {
        final int MAX_VAL = 10000;
        List<Integer> linkedList = new LinkedList<Integer>();
        List<Integer> arrayList = new ArrayList<Integer>();
        for(int i = 0; i < MAX_VAL; i++) {
            linkedList.add(i);
            arrayList.add(i);
        }
        long time = System.nanoTime();
        for(int i = 0; i < MAX_VAL/2; i++) {
            linkedList.remove(MAX_VAL/2);
        }
        System.out.println("LL time: " + (System.nanoTime() - time));
        time = System.nanoTime();
        for(int i = 0; i < MAX_VAL/2; i++) {
            arrayList.remove(MAX_VAL/2);
        }
        System.out.println("AL time: " + (System.nanoTime() - time));
    }
}

Output:

LL time: 27581163

AL time: 3103051

Oh, ArrayList is still faster than LinkedList. What is the reason? Was this myth busted? Or maybe I'm wrong?

enter image description here

Community
  • 1
  • 1
bsiamionau
  • 8,099
  • 4
  • 46
  • 73

8 Answers8

30

BUSTED

Not really. Here

for(int i = 0; i < MAX_VAL; i++) {
    linkedList.add(MAX_VAL/2, i);
}

you don't just insert the item; you pay the cost of iterating from the beginning to i every time. Naturally, that's O(i).

On the other hand, the list must be quite large before you'll actually witness the performance benefit of mid-list insertion. System.arraycopy is a blazing-fast operation and, on the other end, each insertion into a LinkedList requires the allocation of a node instance.

In summary, ArrayList is a better choice for 99% or more of real-world cases, and leveraging the narrow advantage of a LinkedList requires great care.

General notes on microbenchmarking the JVM

I should also warn you that your benchmarking code is badly deficient. There is quite a sizable checklist of things to watch out for when microbencharking on the JVM, for example:

  • always warm up the code to let the JIT compiler get to it;
  • be very careful about interpreting nanoTime results due to accuracy/precision issues. Make the reading grow at least into milliseconds (millions of nanoseconds) to ensure reliability;
  • control the spurious side-effects of the Garbage Collector;
  • etc.

Therefore the advice is to use a ready-made microbenchmarking framework such as OpenJDK's jmh.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
9

To demonstrate the (in)effectivenes of the add() operation, it is better to use the ListIterator object instead of list object. If you use the add() method directly on the linked list, it starts with the list head and has to iterate to the position where you want to insert the item. This part takes O(n). If you use the ListIterator, it will hold the position where we are adding the elements and the algorithm does not have to iterate into the middle of the list each time.

import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import java.util.ListIterator;

public class Test {
    public static void main(String... args) {
        final int MAX_VAL = 10000;
        List<Integer> linkedList = new LinkedList<Integer>();
        List<Integer> arrayList = new ArrayList<Integer>();
        for(int i = 0; i < MAX_VAL; i++) {
            linkedList.add(i);
            arrayList.add(i);
        }
        long time = System.nanoTime();


        for(int i = 0; i < MAX_VAL; i++) {
            linkedList.add(MAX_VAL/2, i);
        }
        System.out.println("LL time:\t" + (System.nanoTime() - time));

        time = System.nanoTime();
        for(int i = 0; i < MAX_VAL; i++) {
            arrayList.add(MAX_VAL/2, i);
        }
        System.out.println("AL time:\t" + (System.nanoTime() - time));


        //Reset the lists
        linkedList = new LinkedList<Integer>();
        arrayList = new ArrayList<Integer>();
        for(int i = 0; i < MAX_VAL; i++) {
            linkedList.add(i);
            arrayList.add(i);
        }

        time = System.nanoTime();
        ListIterator<Integer> li = linkedList.listIterator(MAX_VAL/2);
        for(int i = 0; i < MAX_VAL; i++) {
            li.add(i);
        }
        System.out.println("LL iterator:\t" + (System.nanoTime() - time));

        time = System.nanoTime();
        ListIterator<Integer> ali = arrayList.listIterator(MAX_VAL/2);
        for(int i = 0; i < MAX_VAL; i++) {
            ali.add(i);
        }
        System.out.println("AL iterator:\t" + (System.nanoTime() - time));
    }
}

My results shows that using using ListIterator on LinkedList gives the best performance for inserting the elements in the "middle":

LL time:     237819474
AL time:      31410507
LL iterator:   5423172
AL iterator:  23975798
Matej
  • 6,004
  • 2
  • 28
  • 27
  • if you know the size in advance you can init ArrayList with that size, saving time in reallocating memory for the internal array, during the add() – AlexWien May 29 '13 at 08:44
  • new ArrayList(MAX_VAL); – AlexWien May 29 '13 at 08:45
  • Maybe we would need to initialize the array list with the size of 2*MAX_VAL. However, it is usually not necessry as the array lists heuristically pre-allocate more space when the lists are modified to prevent frequent reallocations. E.g. IBM java pre-allocates 1/2 of the array current size, other implementations might use different heuristics. Similar approach can be found across other programming languages and frameworks as well. – Matej May 29 '13 at 08:52
  • I think the results are different for the two operations (and the benchmarking is broken). But anyway, for these operations modifying the list with a single `addAll` should be faster (dominated by the implicit creation of `Integer` objects). – Tom Hawtin - tackline May 29 '13 at 10:04
  • @Matej No! MAX_VAL is sufficient. ensureCapacity() allocates only when the current size is not sufficient. Only then factor current_size * 1,5 is used. addAll() is even better. – AlexWien May 29 '13 at 12:02
  • @AlexWien Initially, there is a for loop that initializes the array by adding the MAX_VAL elements. After this step, the algorithm continues with the benchmarked part itself and adds another MAX_VAL elements into the middle of the array. We'll have 2*MAX_VAL elements in the array. If we would use the MAX_VAL to initialize the array, we would have the situation when the array is full and we would benchmark the re-allocations. If we would use 2*MAX_VAL, we would benchmark insertions without re-allocations. Both make sense. :-) – Matej May 29 '13 at 13:11
4

Your test has bias - it doesn't measure the usual performance difference.

General observations about LinkedList structure (compared to ArrayList, for a large list):

  1. adding/removing a node at the head or tail is very fast
  2. getting an element from the middle is very slow
  3. getting an element becomes faster (linearly) as you approach either end of the list
  4. getting an element from the head or tail approaches the speed of an ArrayList
  5. adding/removing an element somewhere in the middle is two operations: get plus node insertion
  6. if you use a ListIterator, you can add/remove a node somewhere in the middle and avoid the get - a very fast operation

Your test intends to test (5).

But it always carries out the worst case - adding/removing an element exactly in the middle.

Your micro-benchmark gives systematic error. You need to uniformly or randomly distribute the add/remove location. Or carry out macro-benchmarking with real-life complex & challenging apps.

An interesting read on the challenge of creating an accurate micro-benchmark: Java theory and practice: Anatomy of a flawed microbenchmark

Glen Best
  • 22,769
  • 3
  • 58
  • 74
1

I rewrote Matej's program to randomly pick a method and run an array of 50 trials for each method. If you take an average of fastest half of trials in each category then the result is following:

LL: 570
AL: 120
LL iterator: 1
AL iterator: 60

LL iterator does take much sorter time. In worst cases it's performance drops a factor of 15 both due to warmup (first cycle) and gc (random spikes on unsorted data).

import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import java.util.ListIterator;
import java.util.Random;

public class TestList {

    public static void main(String... args) {
        final int MAX_VAL = 10000;
        int[] currentIndex = {0, 0, 0, 0};
        int[] remaining = {50, 50, 50, 50};
        int[][] sequence = new int[4][50];

        while (keepWorking(remaining)) { //run 50 tests for each case at random

            int currentMethod = chooseMethod(remaining); //choose case. Probability is higher for tests with less trials

            switch (currentMethod) { //run a test based on the choice
                case 0:
                    sequence[currentMethod][currentIndex[currentMethod]] = getLL(MAX_VAL);
                    break;
                case 1:
                    sequence[currentMethod][currentIndex[currentMethod]] = getAL(MAX_VAL);
                    break;
                case 2:
                    sequence[currentMethod][currentIndex[currentMethod]] = getLLIt(MAX_VAL);
                    break;
                default:
                    sequence[currentMethod][currentIndex[currentMethod]] = getALIt(MAX_VAL);
                    break;
            }

            remaining[currentMethod]--;
            currentIndex[currentMethod]++;
        }

        for (int[] ar : sequence) {
            Arrays.sort(ar);
        }

        System.out.println("Time (us\nLL    \tAL\tLL incr\t AL incr");
        for (int i = 0; i < sequence[0].length; i++) {
            System.out.println(sequence[0][i] + "\t" + sequence[1][i] + "\t" + sequence[2][i] + "\t" + sequence[3][i]);
        }
        System.out.println("\nTime normalized to fastest run of a method\nLL\tAL\tLL incr\t AL incr");
        for (int i = 0; i < sequence[0].length; i++) {
            System.out.print(i);
            for (int j = 0; j < sequence.length; j++) {  //to 4
                int a = sequence[j][i] / (sequence[j][0]/100); //to keep result within the scope of int
                System.out.print("\t" + a);
            }
            System.out.println();
        }
    }

    public static boolean keepWorking(int[] remaining) {

        for (int i = 0; i < remaining.length; i++) {
            if (remaining[i] > 0) {
                return true;
            }
        }
        return false;
    }

    public static int chooseMethod(int[] rem) {
        int[] bins = new int[rem.length];
        for (int i = 0; i < rem.length; i++) {
            for (int j = i; j < rem.length; j++) {
                bins[j] += rem[i];
            }
        }
        int randomNum = new Random().nextInt(bins[rem.length - 1]);
        for (int i = 0; i < bins.length; i++) {
            if (randomNum < bins[i]) {
                return i;
            }
        }
        return -1;
    }

    public static int getLL(int MAX_VAL) {

        List<Integer> linkedList = new LinkedList<>();
        for (int i = 0; i < MAX_VAL; i++) {
            linkedList.add(i);
        }
        long time = System.nanoTime();

        for (int i = 0; i < MAX_VAL; i++) {
            linkedList.add(MAX_VAL / 2, i);
        }
        return (int) (System.nanoTime() - time)/1000;
    }

    public static int getAL(int MAX_VAL) {

        List<Integer> arrayList = new ArrayList<>(MAX_VAL);
        for (int i = 0; i < MAX_VAL; i++) {
            arrayList.add(i);
        }
        long time = System.nanoTime();
        for (int i = 0; i < MAX_VAL; i++) {
            arrayList.add(MAX_VAL / 2, i);
        }
        return (int) (System.nanoTime() - time)/1000;
    }

    public static int getLLIt(int MAX_VAL) {

        List<Integer> linkedList = new LinkedList<>();
        for (int i = 0; i < MAX_VAL; i++) {
            linkedList.add(i);
        }

        long time = System.nanoTime();

        ListIterator<Integer> li = linkedList.listIterator(MAX_VAL / 2);
        for (int i = 0; i < MAX_VAL; i++) {
            li.add(i);
        }
        return (int) (System.nanoTime() - time)/1000;
    }

    public static int getALIt(int MAX_VAL) {

        List<Integer> arrayList = new ArrayList<>(MAX_VAL);
        for (int i = 0; i < MAX_VAL; i++) {
            arrayList.add(i);
        }

        long time = System.nanoTime();
        ListIterator<Integer> ali = arrayList.listIterator(MAX_VAL / 2);
        for (int i = 0; i < MAX_VAL; i++) {
            ali.add(i);
        }
        return (int) (System.nanoTime() - time)/1000;
    }
}
sixtytrees
  • 1,156
  • 1
  • 10
  • 25
0

Some care must be taken with simple profiling like this:

  • Garbage collection may occur at unpredictable time slowing down the unpredictable part.
  • JRE is slower when it first starts and later "warms up".

To work around this, do profiling in the loop, repeating both cases many times best in random order, and take the typical values rather than extremities. This sometimes deliver different results.

Audrius Meškauskas
  • 20,936
  • 12
  • 75
  • 93
0

since ArrayList stores values sequentially thus 1 : faster to add values (just add the value in the last index) 2 : slower to update or delete (will have to traverse the whole list before we get to the node)

since array list works on LinkedList concept 1 : slower in inserting (needs to finds the reference to prev or next value) 2 : faster in updation ( as only the exact node can be reached with the reference)

this link can be referred

0

in an ideal scenario you always insert into sorted list. First you locate insert index using binary search mechanism and then insert at that index. Also while doing this you cannot use the same listiterator all the time. You Will have set the iterator to New index location evrytime. So in this real life scenario which insertion is faster.

Maze
  • 1
0

Oof... How dogmatic people can be...

Look up "Bjarne Stroustrup: Why you should avoid Linked Lists" on YouTube to get all the answers. The answers are not from some random StackOverflow posters, but from a computer science professor who - oh by the way - developed C++.

To keep it short:

  • Linked lists maximize your cache misses, because nodes are allocated randomly on the heap, also preventing hard- and software cache prefetching
  • Reading a node into cache is often times a waste of the cache line width, which also maxmizes memory bandwidth waste, which is the Von-Neumann-Bottleneck
  • Linked lists store more data (pointers), which maxmizes memory bandwidth waste, which is the Von-Neumann-Bottleneck
  • Especially with SIMD, streaming arrays is very fast, which will make memcpy of an arbitrarily sized array faster than traversing a linked list of the same size, with all the negative side effects listed above. Inserting into/ deleting from an array is always faster on modern (1980+ ????) machines.

Linked lists are very slow. It's an easy to understand "advanced" data structure and former computer science students stick to them even in the face of hard evidence, as can be seen in the answers. I personally think it's embarrassing.

Linked lists were faster when CPUs didn't have caches - when memory was as fast/faster than CPUs. That's probably around the time when the professors, teaching the previously mentioned students, stopped learning.