Why is summing a sorted array faster than an unsorted one?

Question

I have an ArrayList with numbers from 10 MM to 20 MM

 List<Long> l = new ArrayList<>();
        for (long i = 10_000_000; i < 20_000_000; i++) {
            l.add(i);
        }

Why is the sum() function faster on a sorted array than on an unsorted?

public class Main {

    public static void main(String[] args) {
        List<Long> l = new ArrayList<>();
        for (long i = 10_000_000; i < 20_000_000; i++) {
            l.add(i);
        }

        System.out.println(sum(l));

        Collections.shuffle(l);
        System.out.println(sum(l));

        Collections.sort(l);
        System.out.println(sum(l));
    }

    static long sum(List<Long> l) {
        long started = System.nanoTime();
        long result = 0;
        for (long v : l) {
            result += v;
        }
        System.out.println(" duration: " + (System.nanoTime() - started) / 1_000_000 + "ms");
        return result;
    }

}

I execute the sum() function three times

First time, on a sorted array
Second time on unsorted
And the third time again on sorted

On my computer, the execution time was

85ms
250ms
97ms

Why is the sum() function faster on a sorted array than on an unsorted?

This question seems very clearly to be using [inadequate approaches to benchmarking in Java](https://stackoverflow.com/q/504103/869736) which can produce outright lies. — Louis Wasserman, Mar 06 '23 at 20:21

harold · Answer 1 · 2023-03-06T19:46:25.597

5

It's not about being sorted per-se, it's about the order in which memory is accessed.

I will briefly describe a few experiments to confirm that:

Try using long[] instead of ArrayList<Long>. Then there are no individually allocated Long objects, just one big array that directly contains the data. That makes the memory access order of sum independent of the contents of the array. I did not perform this experiment since it would take a bit more work to write the code.
Try initializing the list with randomly-ordered values. I tried that (I also made the list a bit shorter to save time) and got this result:
```
duration: 31ms
-6262510617230624864
duration: 77ms
-6262510617230624864
duration: 160ms
-6262510617230624864
```
ie the fastest order is the same as the order in which the Long objects were created, independently of their values.
Recreate the Long objects in-order after shuffling, for example:
```
  for (int i = 0; i < l.size(); i++)
      l.set(i, new Long(l.get(i)));
```
That makes the sum of the shuffled list fast again (subsequently sorting the list un-orders the memory access pattern and makes sum slow again).

There is actually no guarantee that Long objects that are allocated one after the other will be allocated in the same order in memory, but that is a pretty common thing to happen, when a sufficiently large number of them is allocated and not much disturbance happens meanwhile.

edited Mar 06 '23 at 19:46

answered Mar 06 '23 at 19:34

harold

61,398
6
86
164

This is true, if you use `long[]` there will be practically no difference in time. But why does `Long` work differently? How is it stored in memory if the `ArrayList` is a sequence of elements and no matter what elements – Ignatyo Mar 06 '23 at 19:55
@Ignatyo yes but the difference is that `long` is a primitive type while a `Long` is an actual heap-allocated object. Putting `ArrayList` aside, a `Long[]` would be an array of object references (with actual data stored in heap-allocated `Long` objects) while a `long[]` directly contains the values – harold Mar 06 '23 at 20:04
@Ignatyo i.e., each element in a `long[]` is the `long` value itself, whereas each element in a `Long[]` is a pointer to the `Long` object (somewhere on the heap). – Slaw Mar 06 '23 at 20:11
@harold I understand it. But why then is there a difference depending on sorted or unsorted array, if references are stored in `ArrayList` anyway. We also access the element located at heap-allocated using a reference. Why is reading time reduced? – Ignatyo Mar 06 '23 at 20:16
1

@Ignatyo You are creating `Long` objects "in order". This _likely_ means those objects are in contiguous memory, in the same order as they appear in the array, though there's no guarantee of that from the JLS or JVMS. Since you iterate over the array from left to right, you are also iterating over the `Long`s in the order they appear in memory (when the array is sorted). In other words, the memory access is also in order. When you shuffle the array, the memory access is now also essentially random. – Slaw Mar 06 '23 at 20:22
@Ignatyo perhaps you're also interested in the QA [What is locality of reference?](https://stackoverflow.com/q/7638932/555045) – harold Mar 06 '23 at 20:23
@Slaw I just don't understand how changing the order of references in an `ArrayList` (not changing value) affects changes in the storage location of the values of those references. After all, as far as I understand, we only change the order of references, not elements, right? – Ignatyo Mar 06 '23 at 20:38
@Ignatyo shuffling the list affects the order in which the objects will be accessed later, so even though memory still (probably) contains the objects in the old order, the access pattern used by `sum` has become randomized (eg the objects with values 10000000 and 10000001 are probably still close together, but they are not used one after the other anymore) – harold Mar 06 '23 at 20:46
@Ignatyo Assuming the JVM/GC hasn't moved anything around, shuffling the array _doesn't_ change the storage locations (i.e., addresses) of the `Long` objects. It changes the _order which those addresses are accessed_. For example, instead of accessing 0x01, 0x02, 0x03, 0x04 in that order, now you are accessing 0x03, 0x01, 0x04, 0x02 in that shuffled order. But the values at each address are the same. So, after shuffling, instead of accessing the memory in a predictable, one-after-the-other manner, you are accessing the _same memory_ in an essentially random manner. – Slaw Mar 07 '23 at 00:16
@Slaw It turns out that the `CPU` will simply look for a place (the place where the `value` of the `Long` type is located) in memory longer, because (speak figuratively) it will take not the next element after the current one (for example, first 0x02, and then 0x07), right? Can such a change actually triple the runtime? – Ignatyo Mar 07 '23 at 08:43
In your experiment with random values, why did sorted take 160ms while shuffled took only 77ms? Aren't they both equally randomly accessed? What do you mean with "initializing the list with randomly-ordered values"? – Kelly Bundy Mar 07 '23 at 08:50
@KellyBundy If I'm not mistaken, they mean they created the values randomly (e.g., via `Random`) instead of "in order" (e.g., `for (long v = 0L; v < ...; v++)`). And that's demonstrating what matters is the order the `Long` _objects_ are created, rather than the actual _value_ of the `Long`. – Slaw Mar 07 '23 at 08:59
@Ignatyo I don't know in-depth all the arcane magic that processors get up to in order to speed up execution time, but my best guess is the out-of-order memory access messes with the processor's ability to pre-fetch the memory correctly, thus causing more calls into RAM (which is much slower compared to CPU caches and registers). – Slaw Mar 07 '23 at 09:02
@Slaw But then how do you explain that speed difference? I also [tried that](https://ato.pxeger.com/run?1=XVBBCoMwELznFXtMJIVKoRTBlxQpiSY1YKLEFexbevHS_qnP6A-aRL30tDuzM8PuPt_DA9veLctrQn24fL7a9xbQWGUQjB16j-DVoASSNPHCNaHsk4Q4jO2kdacIEVDCdWUpA917uIFxUXdXND9m2ZlVRAaRuBYV2WxUMlIHbgyRqomI_DlPrCCQ4uZICg6SQx05AAxOaxxdt6SdsLIRBYyTpTPj4CYrlS9zxpJ68MYhxQjWlq2Hb_fvf_gB) in Python now, and as expected, both shuffled and sorted were equally slow (both take ~37 ms, while the original order takes ~7 ms). – Kelly Bundy Mar 07 '23 at 09:03
@KellyBundy Good question. It could be the JVM getting in the way of accurate micro-benchmarking. My tests gave 59ms, 663ms, and 455ms. That's plausible to me, because I expect the JVM to get faster after it "warms up" (plus I'm not doing proper benchmark testing, e.g., with JMH). But I do find it weird that harold's times show the _last_ duration as the longest. – Slaw Mar 07 '23 at 09:15
@KellyBundy Ran the tests again and got 103ms, 437ms, and 441ms, which also makes sense. It also shows the problems with running these kind of benchmark tests in Java. They really should be done with something like JMH. – Slaw Mar 07 '23 at 09:21
@Slaw Yes, if it's random numbers, then I also suspect highly inaccurate benchmarking. In which case... Can we trust any of those times at all? Does it really demonstrate that creation order is fastest, if its fast time could also just come from the benchmark being inaccurate? I think this issue should be resolved. – Kelly Bundy Mar 07 '23 at 11:48
@KellyBundy the time difference cannot be purely a benchmarking artifact, that wouldn't explain a difference so large and in that direction (getting slower). Anyway I used `i * 0x12345678DEADBEEFL` as "random" numbers because I was lazy and that was faster to implement, so you can try with that and see if it replicates (it does for me) – harold Mar 07 '23 at 14:24
Well, I don't have much experience with benchmarking Java, and I didn't know about the not exactly random data, so benchmarking artifacts was all I could suspect. I guess the non-randomness can explain it then. If so, then it's interesting that sorted is slower than shuffled. – Kelly Bundy Mar 07 '23 at 15:48
@KellyBundy I ran some tests with JMH (3 forks, default number warmups and measurements, measured average time, and created a new `Long[]` per iteration). Array of random numbers: `46.005 ± 0.507 ms/op`; array of shuffled random numbers: `447.524 ± 34.462 ms/op`; and array of sorted random numbers: `463.636 ± 11.024 ms/op`. So, as you can see, the shuffled and sorted arrays take the same amount of time, and both take longer than the "ordered" array (where the order is the order the `Long` objects were created). – Slaw Mar 07 '23 at 23:40
@Slaw Thanks. Was that with harold's `i * 0x12345678DEADBEEFL` or with something more random? – Kelly Bundy Mar 08 '23 at 01:49
@KellyBundy I used `java.util.Random#nextLong()`. – Slaw Mar 08 '23 at 01:51

Why is summing a sorted array faster than an unsorted one?

1 Answers1