What works faster: two dimensional arrays or list of lists

Question

I have a performance situation at hand.

I have a huge amount of data to be held in memory in a two dimensional table format (12000 X 2000). Now as far as my knowledge goes either I can use int[][] or List<List<Integer>>. And, of course, I access the values using int[i][j] or list.get(i).get(j). I am looping through the entire data at least five times.

Which one do you think will work faster and, if you can answer, why? Also is there any way to speed up the execution?

My java -version gives:
java version "1.6.0_29" Java(TM) SE Runtime Environment (build 1.6.0_29-b11) Java HotSpot(TM) Client VM (build 20.4-b02, mixed mode, sharing)
The OS is Windows Vista.

Leaving assumptions alone for a moment, what do your benchmarking test results say? — Grant Thomas, Nov 23 '11 at 15:15
In theory Array will always win and list of list means we have to play around with generics and other things but it also depends upon what suits best in situation — Umesh Awasthi, Nov 23 '11 at 15:16
related: [Array or List in Java. Which is faster?](http://stackoverflow.com/questions/716597/array-or-list-in-java-which-is-faster) — amit, Nov 23 '11 at 15:16
Why don't you just time it? Also there is nothing much you can do to speed up if you need to iterate over the whole data set. — Strelok, Nov 23 '11 at 15:17
Agree with @Mr.Disappointment, benchmarking is helpful. That said, it's quite hard to beat the performance of an array in most circumstances. — Kevin, Nov 23 '11 at 15:17

Dave Newton · Accepted Answer · 2011-11-23T15:28:16.390

6

The array will almost certainly be faster.

Using an ArrayList will bring the performance more in-line since it's backed by an actual array.

Edit to summarize comments

Lists are re-sizeable. May or may not be an issue.
Performance differences tend towards minimal.
It should be benchmarked to know for sure.

For this usecase I believe the arrays will be measurably faster. Whether it's faster enough to matter is a different issue, and I don't know enough about the actual problem being solved to make a judgement on that.

edited Nov 23 '11 at 15:28

answered Nov 23 '11 at 15:15

Dave Newton

158,873
26
254
302

1

according to this: [Array or List in Java. Which is faster?](http://stackoverflow.com/questions/716597/array-or-list-in-java-which-is-faster), the difference seems minor, according to the benchmark conducted by the OP. – amit Nov 23 '11 at 15:17
There will always be the overhead of (un)boxing the values while reading/writing them from the List. – shams Nov 23 '11 at 15:20
@amit It depends what kind of `List` (ArrayList, LinkedList, ...?) and how you're using it. Also that was a 1-d array, 2-d will be a bit slower because it requires more dereferences. You should, as mentioned, just benchmark them. – Kevin Nov 23 '11 at 15:21
+1: It is small in the overall scheme of things. However if you have a micro-benchmark which only accesses the array/List is has a high relative difference. – Peter Lawrey Nov 23 '11 at 15:21
@amit The OP said that the consensus was that the differences were minor, not that s/he did a benchmark. The person that *did* to a benchmark noticed a minor difference in a rather small array, smaller (and less-dimensional) than the data this OP is discussing. It needs to be benchmarked for *this usecase*. – Dave Newton Nov 23 '11 at 15:23
@Kevin: You can always cache a reference to the outer array, reducing the dereferencing to that of a 1-d array. – Dan Breslau Nov 23 '11 at 15:25
@amit as mentioned in reply to another comment, that question used Strings. Using of int will make a huge difference with respect to performance in favor of arrays. – shams Nov 23 '11 at 15:25
We all agree a benchmark is needed? Great! But why doesn't your answer say that too? – Nov 23 '11 at 15:25
@DanBreslau if your access pattern is random, caching won't be of much use. It is likely the compiler will introduce this caching for you in the easy cases anyways. – shams Nov 23 '11 at 15:27
@Shams, if the access pattern is random, then the use of lists shouldn't even be a consideration! – Dan Breslau Nov 23 '11 at 15:29
@delnan Arrays of primitives *will* be faster; don't really need a benchmark for that, IMO. But that's not the real question--are the tradeoffs are *worth* it, for this usecase, which we don't know enough to know. – Dave Newton Nov 23 '11 at 15:39
@DanBreslau An `ArrayList`'s random access should be inline with an array minus boxing and method call overhead. It doesn't sound like it's random, though. – Dave Newton Nov 23 '11 at 15:40
@DᴀᴠᴇNᴇᴡᴛᴏɴ: Then why did you state "It needs to be benchmarked for this usecase."? – Nov 23 '11 at 15:40
@delnan To see if the tradeoffs are worth it. – Dave Newton Nov 23 '11 at 15:55

score 2 · Answer 2 · answered Nov 25 '11 at 11:16

1) Benchmark your application as a whole. Don't assume that you know where the perf bottlenecks in your application are. Experience shows again and again and again that humans generally suck at this. Do this on hardware and systems which are identical to production, or you're wasting your time.

2) Don't forget to structure your benchmark in such a way that the JIT compiler has kicked in for the code you care about. 10000 iterations of a method are typically needed before a method is compiled. Benchmarking interpreted-mode code is a total waste of time.

3) In an application where the most significant bottlenecks have been dealt with, many applications will be in a state where the performance profile is dominated by the number of processor L1 cache misses. You can regard this as being the point at which your application is reasonably well-tuned. Your algorithms may still suck however, and there may still be loads of busywork going on in the system that you can get rid of.

4) Assuming that your algorithms don't suck and that you have no major chunks of busywork that you can get rid of, if the array / List difference is truly significant for you then it's at this point that you'll start to see it in the perf numbers.

5) Under most circumstances, you will find that the L1 cache situation will be better for arrays than for lists. However, this is general advice, not to be mistaken for actual performance tuning advice. Generate your own perf numbers and analyse them.

tl;dr version: Read the long version. tl;dr has no place in Java performance discussion - this is subtle and complex stuff and the nuances matter.

score 1 · Answer 3 · answered Nov 23 '11 at 15:20

If list implements RandomAccess (e.g. ArrayList) it almost does not cause any performance degradation. If you are using LinkedList random access to its members can be very expensive.

Lists bring you a very serious benefit: they can grow automatically. And lists are collections that gives you certain benefits in copying from one collection to other (e.g. from map to list etc.)

So you choice should depend on the fact whether you need your list to grow automatically and whether the performance issues are really very important for you. In most cases they are not.

And the last remark. I think that both N-dimensional arrays and list are not the best choice. If you need N dimensions where N>1 create class and store its instances into 1-dimensional array or collection.

score 1 · Answer 4 · answered Nov 23 '11 at 15:21

...of course, the int[][] will use less memory too. If possible, try using byte[][] or short[][] to further reduce memory usage.

Assuming a 32-bit architecture, 12000x2000 equates to 91MB. If bytes are sufficient, then it will be 1/4 the size. Furthermore, there may be performance improvements as well (architecture-dependent).

score 1 · Answer 5 · answered Nov 23 '11 at 15:39

It depends on the List implementation you are using. If you are using an ArrayList (the one most people use), then performance is going to be essentially identical to an array. But if you are using a LinkedList, then performance will be significantly worse because LinkedLists are very slow when it comes to random access.

When you are creating the data, if you are using an ArrayList, you should initialize the size of its internal array by passing a number into the constructor. Otherwise, initializing the ArrayList will be significantly slower than initializing an array. This is because, when the ArrayList's internal array runs out of space, the ArrayList creates a new, larger array. It then copies all the elements from the old array into the new array. This results in significant performance loss.

int list[][] = new int[12000][2000];
//--or--
List<List<Integer>> list = new ArrayList<List<Integer>>(12000);
for (int i = 0; i < 12000; i++){
  list.add(new ArrayList<Integer>(2000));
}

shams · Answer 6 · 2011-11-23T16:01:36.607

Here is a simple benchmark that shows the primitive arrays to be much faster. The cost of boxing will make arrays slower though.

Results:

Results summary: 
Geo. Mean Primitive Array time:  0.7010723914083877 ms
Geo. Mean Boxed Array time:  2.517326382701606 ms
Geo. Mean ArrayList time:  1.1690484729741475 ms
Geo. Mean LinkedList time:  2.3522075667709146 ms

Code:

import java.lang.ref.WeakReference;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;

/**
 * User: shams
 * Date: 11/23/11
 * Time: 9:30 AM
 */
public class Benchmark {

   public static void main(String[] args) {

      final int ROW_SIZE = 1200;
      final int COL_SIZE = 200;
      final int numIterations = 10;

      final List<Double> arrayPrimitiveTimes = new LinkedList<Double>();
      final List<Double> arrayBoxedTimes = new LinkedList<Double>();
      final List<Double> linkedListTimes = new LinkedList<Double>();
      final List<Double> arrayListTimes = new LinkedList<Double>();

      for (int i = 0; i < numIterations; i++) {

         {
            tryGarbageCollection();
            startReportingTime();
            final int[][] dataArray = new int[ROW_SIZE][COL_SIZE];
            runPrimitiveArrayCode(dataArray);
            arrayPrimitiveTimes.add(endReportingTime("Primitive Array time: "));
         }
         {
            tryGarbageCollection();
            startReportingTime();
            final Integer[][] dataArray = new Integer[ROW_SIZE][COL_SIZE];
            runBoxedArrayCode(dataArray);
            arrayBoxedTimes.add(endReportingTime("Boxed Array time: "));
         }
         {
            tryGarbageCollection();
            startReportingTime();
            final List<List<Integer>> arrayList = new ArrayList<List<Integer>>(ROW_SIZE);
            for (int r = 0; r < ROW_SIZE; r++) {
               arrayList.add(new ArrayList<Integer>(COL_SIZE));
            }
            runListCode(arrayList);
            arrayListTimes.add(endReportingTime("ArrayList time: "));
         }
         {
            tryGarbageCollection();
            startReportingTime();
            final List<List<Integer>> arrayList = new LinkedList<List<Integer>>();
            for (int r = 0; r < ROW_SIZE; r++) {
               arrayList.add(new LinkedList<Integer>());
            }
            runListCode(arrayList);
            linkedListTimes.add(endReportingTime("LinkedList time: "));
         }
      }

      System.out.println("\n\n Results summary: ");
      printResult("Geo. Mean Primitive Array time: ", getMiddleGeoMeanTime(arrayPrimitiveTimes));
      printResult("Geo. Mean Boxed Array time: ", getMiddleGeoMeanTime(arrayBoxedTimes));
      printResult("Geo. Mean ArrayList time: ", getMiddleGeoMeanTime(arrayListTimes));
      printResult("Geo. Mean LinkedList time: ", getMiddleGeoMeanTime(linkedListTimes));
   }

   private static void runPrimitiveArrayCode(final int[][] dataArray) {
      for (int i = 0; i < dataArray.length; i++) {
         int[] cached = dataArray[i];
         for (int j = 0; j < cached.length; j++) {
            cached[j] = cached[j] + i + j;
         }
      }
   }

   private static void runBoxedArrayCode(final Integer[][] dataArray) {
      for (int i = 0; i < dataArray.length; i++) {
         Integer[] cached = dataArray[i];
         for (int j = 0; j < cached.length; j++) {
            Integer oldData = cached[j]; // dummy read
            cached[j] = i + j + (oldData == null ? 0 : 1);
         }
      }
   }

   private static void runListCode(final List<List<Integer>> dataArray) {
      for (int i = 0; i < dataArray.size(); i++) {
         final List<Integer> cached = dataArray.get(i);
         for (int j = 0; j < cached.size(); j++) {
            cached.set(j, cached.get(j) + i + j);
         }
      }
   }


   public static void tryGarbageCollection() {
      int count = 0;
      int limit = 2;
      while (count < limit) {
         count += 1;
         // println("enforceGarbageCollection: starting enforce of GC")

         int attempts = 0;
         WeakReference<Object> wr = new WeakReference<Object>(new Object());
         while (wr.get() != null && attempts < 25) {
            // add some delay
            int busy = 0;
            while (busy < 100) {
               busy += 1;
               wr.get();
            }
            new Object();
            System.out.print(".");
            System.gc();
            attempts += 1;
         }
         // println("enforceGarbageCollection: done GC")
      }
   }

   private static long startTime = 0;

   public static void startReportingTime() {
      startTime = System.nanoTime();
   }

   public static double endReportingTime(String msg) {
      long newTime = System.nanoTime();
      double execTime = (newTime - startTime) / 1e6;
      System.out.println(msg + execTime + "ms");
      return execTime;
   }

   public static double getBestTime(List data) {
      if (data.isEmpty()) {
         return 0;
      } else {
         java.util.Collections.sort(data);
         return ((Double) data.get(0)).doubleValue();
      }
   }

   public static double getMiddleGeoMeanTime(List<Double> data) {
      java.util.Collections.sort(data);
      List<Double> sortedResult = data;
      double midValuesProduct = 1.0;
      int midValuesCount = 0;
      for (int i = 1; i < sortedResult.size() - 1; i++) {
         midValuesCount += 1;
         midValuesProduct *= sortedResult.get(i).doubleValue();
      }
      final double average;
      if (midValuesCount > 0) {
         average = Math.pow(midValuesProduct, 1.0 / midValuesCount);
      } else {
         average = 0.0;
      }
      return average;
   }

   public static void printResult(String msg, double timeInMs) {
      System.out.println(msg + " " + timeInMs + " ms");
   }
}

@kittylyst note that the geometric mean only 8 out of 10 values! The JIT kicks in after the first run. Also there are attempts to GC before making a consequent run. — shams, Nov 26 '11 at 14:28

score 0 · Answer 7 · answered Nov 23 '11 at 15:15

0

I think two-dimensional array will be faster in most cases, but why don't you test it on your specific problem?

answered Nov 23 '11 at 15:15

unbeli

29,501
5
55
57

score 0 · Answer 8 · edited May 23 '17 at 12:27

0

There is an extensive discussion on this here:

Array or List in Java. Which is faster?

Here is the benchmark conclusion:

I wrote a little benchmark to compare ArrayLists with Arrays. On my old-ish laptop, the time to traverse through a 5000-element arraylist, 1000 times, was about 10 milliseconds slower than the equivalent array code.

edited May 23 '17 at 12:27

Community

1
1

answered Nov 23 '11 at 15:16

Jonathan Schneider

26,852
13
75
99

2

That benchmark used Strings instead of ints. That will prove to be a crucial difference as the extra layer of indirection will be saved as well as the cost of boxing. If your array isn't huge and fits in memory an array of primitives should be faster than a List of List of Objects. – shams Nov 23 '11 at 15:23
That is true. The conclusion of the benchmark was that arrays are still faster. That should be the obvious conclusion anyway I think. – Jonathan Schneider Nov 23 '11 at 15:28

What works faster: two dimensional arrays or list of lists

8 Answers8