13

I wrote a test that attempts to test two things:

  • Whether the size of a buffer array affects its performance, even if you don't use the whole buffer
  • The relative performance of arrays and ArrayList

I was kind of surprised with the results

  • Boxed arrays (i.e. Integer vs int) are not very much slower than the primitive version
  • The size of the underlying array doesn't matter very much
  • ArrayLists are more than twice as slow as the corresponding array.

The Questions

  1. Why is ArrayList so much slower?
  2. Is my benchmark written well? In other words, are my results accurate?

The Results

 0% Scenario{vm=java, trial=0, benchmark=SmallArray} 34.57 ns; ?=0.79 ns @ 10 trials
17% Scenario{vm=java, trial=0, benchmark=SmallBoxed} 40.40 ns; ?=0.21 ns @ 3 trials
33% Scenario{vm=java, trial=0, benchmark=SmallList} 105.78 ns; ?=0.09 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=BigArray} 34.53 ns; ?=0.05 ns @ 3 trials
67% Scenario{vm=java, trial=0, benchmark=BigBoxed} 40.09 ns; ?=0.23 ns @ 3 trials
83% Scenario{vm=java, trial=0, benchmark=BigList} 105.91 ns; ?=0.14 ns @ 3 trials

 benchmark    ns linear runtime
SmallArray  34.6 =========
SmallBoxed  40.4 ===========
 SmallList 105.8 =============================
  BigArray  34.5 =========
  BigBoxed  40.1 ===========
   BigList 105.9 ==============================

vm: java
trial: 0

The Code

This code was written in Windows using Java 7 and Google caliper 0.5-rc1 (because last I checked 1.0 doesn't work in Windows yet).

Quick outline: in all 6 tests, in each iteration of the loop, it adds the values in the first 128 cells of the array (no matter how big the array is) and adds that to a total value. Caliper tells me how many times the test should run, so I loop through that addition 128 times.

The 6 tests have a big (131072) and a small (128) version of int[], Integer[], and ArrayList<Integer>. You can probably figure out which is which from the names.

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

import com.google.caliper.Runner;
import com.google.caliper.SimpleBenchmark;

public class SpeedTest {    
    public static class TestBenchmark extends SimpleBenchmark {
        int[] bigArray = new int[131072];
        int[] smallArray = new int[128];
        Integer[] bigBoxed = new Integer[131072];
        Integer[] smallBoxed = new Integer[128];
        List<Integer> bigList = new ArrayList<>(131072);
        List<Integer> smallList = new ArrayList<>(128);

        @Override
        protected void setUp() {
            Random r = new Random();
            for(int i = 0; i < 128; i++) {
                smallArray[i] = Math.abs(r.nextInt(100));
                bigArray[i] = smallArray[i];
                smallBoxed[i] = smallArray[i];
                bigBoxed[i] = smallArray[i];
                smallList.add(smallArray[i]);
                bigList.add(smallArray[i]);
            }
        }

        public long timeBigArray(int reps) {
            long result = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < 128; j++) {
                    result += bigArray[j];
                }
            }
            return result;
        }

        public long timeSmallArray(int reps) {
            long result = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < 128; j++) {
                    result += smallArray[j];
                }
            }
            return result;
        }

        public long timeBigBoxed(int reps) {
            long result = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < 128; j++) {
                    result += bigBoxed[j];
                }
            }
            return result;
        }

        public long timeSmallBoxed(int reps) {
            long result = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < 128; j++) {
                    result += smallBoxed[j];
                }
            }
            return result;
        }

        public long timeBigList(int reps) {
            long result = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < 128; j++) {
                    result += bigList.get(j);
                }
            }
            return result;
        }

        public long timeSmallList(int reps) {
            long result = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < 128; j++) {
                    result += smallList.get(j);
                }
            }
            return result;
        }
    }

    public static void main(String[] args) {
        Runner.main(TestBenchmark.class, new String[0]);
    }
}
durron597
  • 31,968
  • 17
  • 99
  • 158
  • what is that runtime a runtime of? – Sam I am says Reinstate Monica Dec 09 '13 at 22:36
  • @SamIam I'm not sure I understand the question; the runtime is of the benchmark code below it. – durron597 Dec 09 '13 at 22:41
  • Is it the amount of time that an entire `timeBigBoxed` or similar method took? Is it the amount of time that a single iteration of one of the `for` loops took? – Sam I am says Reinstate Monica Dec 09 '13 at 22:43
  • Don't you think that it's a little odd that the bigArray and the SmallArray took the same amount of time to execute? – Sam I am says Reinstate Monica Dec 09 '13 at 22:44
  • In a sort of broad sense without having time to study the problem, it's reasonable that two memory accesses, de-reference List, access array, takes twice as long as one memory access, everything else involved being a trivial operation. – Affe Dec 09 '13 at 22:44
  • @SamIam It should be the amount of time one of the INNER `for` loops took. So it should be the time of 128 additions (assuming I wrote it well and Hotspot isn't messing with the results) – durron597 Dec 09 '13 at 22:44
  • @SamIam In all cases it only adds the first 128 elements, so I am not exactly surprised, but I did think the difference would be far longer than it turned out to be. – durron597 Dec 09 '13 at 22:45
  • I think that you need more than 128 iterations for your big arrays – Sam I am says Reinstate Monica Dec 09 '13 at 22:45
  • @SamIam I am not trying to test whether iterating through a big array would be slower than a small array; I am trying to test whether iterating to the end of the valid data in a buffer is affected by array size. In other words, (this is more like a C scenario but you get the idea) if you have two `char` arrays, one that has `char[6]={'H','e','l','l','o','\0'}` and `char[100]={'H','e','l','l','o','\0','\0','\0',...}` will one of them be faster than the other to print? – durron597 Dec 09 '13 at 22:49
  • Mainly the ArrayList will be slower because you're getting the call overhead, plus twice as much bounds checking. – Hot Licks Dec 09 '13 at 22:54
  • 2
    Random.nextInt(n) always return a positive number. BTW Math.abs(n) *doesn't* always return a positive number. – Peter Lawrey Dec 09 '13 at 23:02
  • I'm curious as to what JVM arguments you're using. You should try `-XX:AggressiveOpts` and `-server`. Aggressive opts should improve the boxed array by eliminating autoboxing, and the server option should improve the optimisations used when the function is actually compiled. You should also be aware that your tests may not be running enough times for the code to be picked up as a hotspot and compiled. Use `-XX:CompileThreshold=0` to ensure that you're seeing the compiled performance. – bdean20 Dec 10 '13 at 00:24
  • 2
    @PeterLawrey Actually Random.nextInt(n) returns a *non-negative* number, but your point is taken that it's pointless to call Math.abs() on the result. +1 anyway. :-) – Stuart Marks Dec 10 '13 at 00:52
  • @StuartMarks For those who don't know. Math.abs(Integer.MIN_VALUE) returns Integer.MIN_VALUE which is negative. If you generate hashcodes there is a very small chance the hash is MIN_VALUE. This leads to a very rare bug in your code which you might not see in testing. – Peter Lawrey Dec 10 '13 at 09:15

4 Answers4

9

Firstly ...

Are ArrayLists more than twice as slow as arrays?

As a generalization, no. For operations that potentially involve "changing" the length of the list / array, an ArrayList will be faster than an array ... unless you use a separate variable to represent the array's logical size.

For other operations, the ArrayList is likely to be slower, though the performance ratio will most likely depend on the operation and the JVM implementation. Also note that you have only tested one operation / pattern.

Why is ArrayList so much slower?

Because an ArrayList has a distinct array object inside of it.

  • Operations typically involve extra indirections (e.g. to fetch the list's size and inner array) and there are extra bounds checks (e.g. checking the list's size and the array's length). A typical JIT compiler is (apparently) not able to optimize these away. (And in fact, you would NOT want to optimize away the inner array because that's what allows an ArrayList to grow.)

  • For array's of primitives, the corresponding list types involve wrapped primitive types / objects, and that adds an overhead. For example your result += ... involves unboxing, in the "list" cases.

Is my benchmark written well? In other words, are my results accurate?

There's nothing wrong with it technically. But it is not sufficient to demonstrate your point. For a start, you are only measuring one kind of operation: array element fetching and its equivalent. And you are only measuring for primitive types.


Finally, this largely misses the point of using List types. We use them because they are almost always easier to use than plain arrays. A difference in performance of (say) 2 is typically not important to overall application performance.

soz
  • 304
  • 2
  • 11
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Great answer @StephenC. Though, I will say that in my use case (the reason I wrote the benchmark), that operation is the one that matters; things like insertion and deletion are far less important. – durron597 Dec 09 '13 at 23:25
  • 1
    Fair enough. But then you should not be trying to generalize ... in the way that your Question's text does. – Stephen C Dec 09 '13 at 23:51
4

Keep in mind that in using ArrayList, you are actually calling a function, which in the case of get() actually makes two other function calls. (One of which is a range check, which I suspect may be part of the delay).

The important thing with ArrayList, is not so much how much faster or slower it is compared with straight arrays, but that it's access time always constant (like arrays). In the real world, you'll almost always find that the added delay is negligible. Especially if you have an application that even thinks about connecting to a database. :)

In short, I think your test (and the results) are legit.

matt forsythe
  • 3,863
  • 1
  • 19
  • 29
  • I think you're right about the range check. And `ArrayList` has to check two different ranges, because even if the backing array is, say, 100 elements long, if the effective size is only 3, you can't just rely on array bounds. – durron597 Dec 09 '13 at 22:59
  • Right - which is the only real reason it has to do the check. Otherwise, it could just rely on the array access to throw the IndexOutOfBoundsException – matt forsythe Dec 09 '13 at 23:03
  • The compiler will optimize the method calls so that the calls themselves aren't costly, but the important thing is that the call to get() results in 2 actions instead of one. The range check is most likely what is resulting in the slower performance. – LINEMAN78 Dec 09 '13 at 23:05
  • ArrayList's methods will most likely be inlined – Steve Kuo Dec 10 '13 at 01:33
0

These results don't surprise me. List.get(int) involves a cast, which is slow. Java's generics are implemented via type erasure, meaning that any type of List<T> is actually a List<Object>, and the only reason you get your type out is because of a cast. The source for ArrayList looks like this:

public E get(int index) {
    rangeCheck(index);

    return elementData(index);
}
// snip...
private void rangeCheck(int index) {
    if (index >= size)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
}
// snip...
@SuppressWarnings("unchecked")
E elementData(int index) {
    return (E) elementData[index];
}

The rangeCheck and the overhead of the function calls are trivial, it's that cast to E that kills you.

MattPutnam
  • 2,927
  • 2
  • 17
  • 23
  • Cast is fairly cheap. And the cast to (E) in the above code is totally fictitious. – Hot Licks Dec 09 '13 at 22:52
  • Is there a source that you know of that indicates type casts are particularly slow? – matt forsythe Dec 09 '13 at 22:52
  • @HotLicks Actually he is correct about the code. `ArrayList` is backed by an `Object[]` not an `E[]`. What I'm uncertain of is whether the cast is the real problem. I think it's far more likely to be the `rangeCheck` – durron597 Dec 09 '13 at 22:55
  • In C, a cast has zero cost. It's really just a note to the compiler to disregard type incompatibility. For Java, there *might* be a cost if the cast triggers a typecast exception check that wouldn't be necessary otherwise. – Edward Falk Dec 09 '13 at 22:55
  • I guess if there were an extremely complicated type hierarchy, then the cast could introduce greater overhead as it has to sift through all the different types to figure out whether or not to throw a ClassCastException. But in this case, it should be trivial. – matt forsythe Dec 09 '13 at 22:57
  • I implemented cast check for IBM iSeries. If the class named in the cast statement is exactly the same as the actual object class then it's a single test and quite cheap (or should be, if Sun/Oracle did it right). If the object is a subclass it's a hair slower, but not much. – Hot Licks Dec 09 '13 at 22:57
  • 3
    @durron597 - But I stand by my statement that the above cast is fictitious. The real cast occurs on the calling side. – Hot Licks Dec 09 '13 at 22:58
  • Worst case for a cast is probably when the target is an interface, buried several layers deep in the class hierarchy. – Hot Licks Dec 09 '13 at 23:00
  • @HotLicks Agreed, this not an unchecked cast, which is costly, but instead a fictitious cast for the compiler. – LINEMAN78 Dec 09 '13 at 23:01
  • 1
    I'm not even sure this particular cast (since it is casting to a parameterized type - not an actual type) even does anything at the bytecode level at all? (More of a question than a statement.) – matt forsythe Dec 09 '13 at 23:04
  • @mattforsythe - That's why I said it's "fictitious" -- it's not really there. The "real" cast is provided by the compiler, on the caller's side. – Hot Licks Dec 09 '13 at 23:05
  • @HotLicks Yeah, I think I posted the question before expanding all the comments. :) – matt forsythe Dec 09 '13 at 23:06
0

If you store millions of objects, then the Add or Contains functions will be super slow. Best way is to split it using hashMap of Arrays. Although similar algorithms can be used for other types of objects, this is how I improved 1000 times faster the processing of 10 million strings (the memory taken is 2-3 times more)

public static class ArrayHashList  {
    private String temp1, temp2;
    HashMap allKeys = new HashMap();
    ArrayList curKeys;  
    private int keySize;
    public ArrayHashList(int keySize) {
        this.keySize = keySize;
    }   
    public ArrayHashList(int keySize, String fromFileName) {
        this.keySize = keySize;
        String line;
        try{
            BufferedReader br1 = new BufferedReader(new FileReader(fromFileName));        
            while ((line = br1.readLine()) != null) 
                addString(line);
            br1.close();
        }catch(Exception e){
            e.printStackTrace();
        }
    }   
    public boolean addString(String strToAdd) {  
        if (strToAdd.length()<keySize)
            temp1 = strToAdd;
        else
            temp1 = strToAdd.substring(0,keySize);
        if (!allKeys.containsKey(temp1))
            allKeys.put(temp1,new ArrayList());
        curKeys =  (ArrayList)allKeys.get(temp1);
        if (!curKeys.contains(strToAdd)){
            curKeys.add(strToAdd);
            return true;
        }
        return false;
    }
    public boolean haveString(String strCheck) { 
        if (strCheck.length()<keySize)
            temp1 = strCheck;
        else
            temp1 = strCheck.substring(0,keySize);
        if (!allKeys.containsKey(temp1))
            allKeys.put(temp1,new ArrayList());
        curKeys =  (ArrayList)allKeys.get(temp1);
        return curKeys.contains(strCheck);
    }
}

to init and use it:

ArrayHashList fullHlist = new ArrayHashList(3, filesPath+"\\allPhrases.txt");       
ArrayList pendingList = new ArrayList();
BufferedReader br1 = new BufferedReader(new FileReader(filesPath + "\\processedPhrases.txt"));
while ((line = br1.readLine()) != null) {
    wordEnc = StrUtil.GetFirstToken(line,",~~~,");
    if (!fullHlist.haveString(wordEnc))
        pendingList.add(wordEnc);
}
br1.close();