removeAll ArrayList vs LinkedList performance

Question

I have a question about this program.

public class Main {
    public static void main(String[] args) {
        List<String> arrayList = new ArrayList<String>();
        for(int i=0; i<100; i++){
            arrayList.add("ValueA");
            arrayList.add("ValueB");
            arrayList.add(null);
            arrayList.add("ValueC");
            arrayList.add(null);
            arrayList.add(null);            
        }
        long startTime = System.nanoTime();
        arrayList.removeAll(Collections.singleton(null));
        long endTime = System.nanoTime();
        System.out.println("ArrayList removal took: " + (endTime - startTime) + "ms");          

        List<String> linkedList = new LinkedList<String>();
        for(int i=0; i<100; i++){
            linkedList.add("ValueA");
            linkedList.add("ValueB");
            linkedList.add(null);
            linkedList.add("ValueC");
            linkedList.add(null);
            linkedList.add(null);
        }

        startTime = System.nanoTime();
        linkedList.removeAll(Collections.singleton(null));
        endTime = System.nanoTime();
        System.out.println("LinkedList removal took: " + (endTime - startTime) + "ms");
    }
}

System output is:

ArrayList removal took: 377953ms
LinkedList removal took: 619807ms

Why is linkedList taking more time than arrayList on removeAll?

Could you give us more data points? In particular, I would like to see 2 trendlines for `ArrayList` and `LinkedList` — Tim Biegeleisen, Dec 11 '15 at 06:12
I don't think that `removeAll()` is a particularly good benchmark because you are simply emptying out the entire data structure. A better benchmark would be removing a random element. — Tim Biegeleisen, Dec 11 '15 at 06:26
@TimBiegeleisen, `removeAll(Collection)` is not really empting the whole DS... And he is using that... — Codebender, Dec 11 '15 at 06:29
Good point, I retract what I said. Removing the first element for collections of size 1000 in the above code takes the same time for `ArrayList` and `LinkedList`, but removing the middle element requires more time for the latter than the former. — Tim Biegeleisen, Dec 11 '15 at 06:30

Nir Alfasi · Accepted Answer · 2015-12-14T17:56:52.430

As Milkmaid mentioned this is not how you should do benchmarking, but I believe that the results you're getting are still valid.

Let's look "under the hood" and see both implementations:

ArrayList.removeAll calls batchRemove:

private boolean batchRemove(Collection<?> c, boolean complement) {
    final Object[] elementData = this.elementData;
    int r = 0, w = 0;
    boolean modified = false;
    try {
        for (; r < size; r++)
            if (c.contains(elementData[r]) == complement)
                elementData[w++] = elementData[r];
    } finally {
        // Preserve behavioral compatibility with AbstractCollection,
        // even if c.contains() throws.
        if (r != size) {
            System.arraycopy(elementData, r,
                             elementData, w,
                             size - r);
            w += size - r;
        }
        if (w != size) {
            // clear to let GC do its work
            for (int i = w; i < size; i++)
                elementData[i] = null;
            modCount += size - w;
            size = w;
            modified = true;
        }
    }
    return modified;
}

As you can see, ArrayList first "defragments" the underlying array by overriding the elements that need to be removed with the elements that come afterwards (complement is passed as false so only objects that are not null are copied):

if (c.contains(elementData[r]) == complement)
    elementData[w++] = elementData[r];

The following if (r != size) handles the case that an exception was thrown from c.contains and it uses the "magic function" System.arraycopy to copy the rest of the elements from the current index till the end - this part runs by native code and should be considerably fast, which is why we can ignore it.

And in the last if: if (w != size) {...} it simply assigns nulls to the rest of the list so that objects that are eligible will be able to be collected by the GC.

The total number of operations is O(n) and each operation uses direct access to the array.

Now let's have a look at the implementation of LinkedList which is fairly shorter:

public boolean removeAll(Collection<?> c) {
    Objects.requireNonNull(c);
    boolean modified = false;
    Iterator<?> it = iterator();
    while (it.hasNext()) {
        if (c.contains(it.next())) {
            it.remove(); // <-- calls the iterator remove method
            modified = true;
        }
    }
    return modified;
}

As you can see, the implementation uses the iterator in order to remove the elements by calling: it.remove();

public void remove() {
    if (lastRet < 0)
        throw new IllegalStateException();
    checkForComodification();

    try {
        AbstractList.this.remove(lastRet); // <-- this is what actually runs
        if (lastRet < cursor)
            cursor--;
        lastRet = -1;
        expectedModCount = modCount;
    } catch (IndexOutOfBoundsException e) {
        throw new ConcurrentModificationException();
    }
}

Which in turn calls:

public E remove(int index) {
    rangeCheck(index);
    checkForComodification();
    E result = l.remove(index+offset); // <-- here
    this.modCount = l.modCount;
    size--;
    return result;
}

Which calls:

public E remove(int index) {
    checkElementIndex(index);
    return unlink(node(index)); // <-- here
}

which calls:

E unlink(Node<E> x) {
    // assert x != null;
    final E element = x.item;
    final Node<E> next = x.next;
    final Node<E> prev = x.prev;

    if (prev == null) {
        first = next;
    } else {
        prev.next = next;
        x.prev = null;
    }

    if (next == null) {
        last = prev;
    } else {
        next.prev = prev;
        x.next = null;
    }

    x.item = null;
    size--;
    modCount++;
    return element;
}

To sum up

Even though theoretically the remove operation in a LinkedList should be O(1) while ArrayList implementation should take O(n), when dealing with batch-removes ArrayList's implementation is more concise, doing everything in one pass by moving objects to override the ones we remove (kind of defragmentation) while LinkedList' implementation is recursively calling 5 different methods (each of which runs its own safety checks...) for each element it removes, and this ends up with the big overhead that you experienced.

score 0 · Answer 2 · edited May 23 '17 at 11:45

First of all 100 elements are not enough to test performance. But from theory: Data in array are(usually) stored in memory one after one. In linked list you have value and also the pointer to another object. That's mean when you deleting array you just go through connected piece of memory. O contre If you deleting from Linked list you has to go through the random pieces memory depends on pointer. There are more differences between array and linked list. Like adding element removing element etc. That's why we have array and linked list. Take a look here Array vs Linked list

score 0 · Answer 3 · edited May 23 '17 at 11:59

The answer to this question boils down to the difference in execution times of for loop. When you deep dive into the code of removeAll() of both these objects you see that the removeAll() of ArrayList calls batchRemove() which looks like:

private boolean batchRemove(Collection<?> c, boolean complement) {
    final Object[] elementData = this.elementData;
    int r = 0, w = 0;
    boolean modified = false;
    try {
        for (; r < size; r++)
            if (c.contains(elementData[r]) == complement)
                elementData[w++] = elementData[r];
    } finally {
        // Preserve behavioral compatibility with AbstractCollection,
        // even if c.contains() throws.
        if (r != size) {
            System.arraycopy(elementData, r,
                             elementData, w,
                             size - r);
            w += size - r;
        }
        if (w != size) {
            // clear to let GC do its work
            for (int i = w; i < size; i++)
                elementData[i] = null;
            modCount += size - w;
            size = w;
            modified = true;
        }
    }
    return modified;
}

On the other hand, when you call the removeAll() of LinkedList, it calls the removeAll() of AbstractCollection which looks like:

public boolean removeAll(Collection<?> c) {
    Objects.requireNonNull(c);
    boolean modified = false;
    Iterator<?> it = iterator();
    while (it.hasNext()) {
        if (c.contains(it.next())) {
            it.remove();
            modified = true;
        }
    }
    return modified;
}

It is clearly visible that in case of ArrayList, a simple for loop is executed as compared to an Iterator based for loop in LinkedList.

The Iterator is better for data structures like LinkedList but it is still slower that the traditional for loop for arrays.

You can know more about the difference in performance of both of these loops here.

removeAll ArrayList vs LinkedList performance

3 Answers3

To sum up