java: List.contains() performance difference with manual searching

Question

I tried to demonstrate the difference between List.contains() and manually searching execution times the result is awesome. Here is the code,

public static void main(String argv[]) {
    List<String> list = new ArrayList<String>();
    list.add("a");
    list.add("a");
    list.add("a");
    list.add("a");
    list.add("a");
    list.add("a");
    list.add("b");

    long startTime = System.nanoTime();

    list.contains("b");

    long endTime = System.nanoTime();
    long duration = endTime - startTime;

    System.out.println("First run: "+duration);

    startTime = System.nanoTime();
    for(String s: list){
        if(s.equals("b"))
            break;
    }
    endTime = System.nanoTime();

    duration = endTime - startTime;
    System.out.println("Second run: "+duration);

}

Output:

First run: 7500
Second run: 158685
1. how contains() function makes such a big difference?
2. which search algorithm does it use?
3. if list conatins the searched element, does it terminate searching at first element?

This performance difference surprises me. In Java, `ArrayList#contains()` calls `ArrayList#indexOf()`, which simply iterates over the array backing the list. So, as far as I can see, the performance difference should be minimal. Did you test with much larger lists? Maybe the time difference is a constant time difference, independent of the list's size? — Chthonic Project, Dec 08 '13 at 00:58
No I actually I didn't. By large how much large do you mean. Like 1000 elements? — Ismail Sahin, Dec 08 '13 at 00:59
I think the the answer given by torquestomp is exactly what I was trying to say. Yes, an array of 1000 elements should be good enough. You can create random-*ish* long String arrays from a bunch of texts, and then compare for each. The collated results of, say, 20 such tests should be a decent benchmark to satisfy personal curiosities :-) — Chthonic Project, Dec 08 '13 at 01:03

torquestomp · Accepted Answer · 2013-12-08T01:18:31.280

First off, it is not wise to trust results coming from a singular test like that. There are too many variable factors, caching implications to consider, and other such things - you should rather consider writing a test that uses randomization over trials to some extent, and performs the different checks millions of times, not just once.

That said, I expect your results will remain the same; ArrayList implements contains() using its own indexOf() method, which loops directly over the underlying array it stores. You can see this for yourself here

The foreach loop, on the other hand, requires instantiating an Iterator, accessing the array through all of its methods, and just in general doing a lot more work than ArrayList's own direct implementation does. Again, though, you should benchmark it more thoroughly!

excellent explanation :), then my question is a litle bit pointless. Because there is nothing with contains but for iteration. — Ismail Sahin, Dec 08 '13 at 01:16

score 4 · Answer 2 · edited May 23 '17 at 11:46

Writing a correct microbenchmark is hard. If you use a better benchmark, you'll likely see that the difference between the approaches is minor - at least, the following benchmark is far more robust, and shows a mere 10% difference in execution time between the two approaches:

public abstract class Benchmark {

    final String name;

    public Benchmark(String name) {
        this.name = name;
    }

    abstract int run(int iterations) throws Throwable;

    private BigDecimal time() {
        try {
            int nextI = 1;
            int i;
            long duration;
            do {
                i = nextI;
                long start = System.nanoTime();
                run(i);
                duration = System.nanoTime() - start;
                nextI = (i << 1) | 1;
            } while (duration < 1000000000 && nextI > 0);
            return new BigDecimal((duration) * 1000 / i).movePointLeft(3);
        } catch (Throwable e) {
            throw new RuntimeException(e);
        }
    }

    @Override
    public String toString() {
        return name + "\t" + time() + " ns";
    }

    public static void main(String[] args) throws Exception {
        final List<String> list = new ArrayList<String>();
        for (int i = 0; i < 1000; i++) {
            list.add("a");
        }

        Benchmark[] marks = {
            new Benchmark("contains") {
                @Override
                int run(int iterations) throws Throwable {
                    for (int i = 0; i < iterations; i++) {
                        if (list.contains("b")) {
                            return 1;
                        }
                    }
                    return 0;
                }
            },
            new Benchmark("loop") {
                @Override
                int run(int iterations) throws Throwable {
                    for (int i = 0; i < iterations; i++) {
                        for (String s : list) {
                            if (s.equals("b")) {
                                return 1;
                            }
                        }
                    }
                    return 0;
                }
            }
        };

        for (Benchmark mark : marks) {
            System.out.println(mark);
        }
    }
}

prints (on my dated notebook, on a Java 7 Oracle JVM in server mode):

contains    10150.420 ns
loop        11363.640 ns

The slightly larger overhead of the loop is likely caused by the Iterator checking for concurrent modification and twice for the end of the list on every access, see the source code of java.util.ArrayList.Itr.next() for details.

Edit: With very short lists, the difference is more pronounced. For instance for a list of length 1:

contains    15.316 ns
loop        69.401 ns

Still, nowhere near the 20:1 ratio your measurements indicate ...

user987339 · Answer 3 · 2013-12-08T01:20:22.670

3

As you can see from the code contains needs O(n) iterations. If you re-implement your for loop to:

for(int i=0; i < list.size(); i++){
    if(list.get(i).equals("b"))
        break;
}

you'll see dramatic improvement in your search time. So you can put blame on List iterator for the time overhead. Iterator instantiation and calls of next and hasNext methods are adding some milliseconds.

edited Dec 08 '13 at 01:20

answered Dec 08 '13 at 01:06

user987339

10,519
8
40
45

Sorry for mistake yes. You are right, actually it did. But still there is a gap between both results – Ismail Sahin Dec 08 '13 at 01:09
1

yes there is a gap, but only few milliseconds. This is because of warming up etc. – user987339 Dec 08 '13 at 01:10

java: List.contains() performance difference with manual searching

3 Answers3