Branch prediction on sorted/unsorted different sized arrays

Question

I stumbled across this question long ago

Why is processing a sorted array faster than processing an unsorted array?

and wanted to try it for myself. I had some surprising results when trying different array sizes.

Here's the test code. (C# with Microsoft compiler, I have not tested other languages)

static void Main(string[] args)
{
    var rand = new Random();
    Time(rand, true, 1000, 100000);
    Time(rand, false, 1000, 100000);
    Time(rand, true, 1000000, 100);
    Time(rand, false, 1000000, 100);
    Console.ReadKey(true);
}

static void Time(Random rand, bool sort, int arraysize, int loopsize)
{
    var stopwatch = new Stopwatch();

    var array = new int[arraysize];
    array = Array.ConvertAll(array, _ => (int)(rand.NextDouble() * int.MaxValue));

    if (sort)
        Array.Sort(array);

    stopwatch.Start();

    var count = 0;
    for (var j = 0; j < loopsize; j++)
        for (var i = 0; i < array.Length; i++)
            if (array[i] > int.MaxValue / 2)
                count++;

    stopwatch.Stop();

    Console.WriteLine("{0} ticks, {1}ms, sort {2}, arraysize {3}, loopsize {4}",
        stopwatch.ElapsedTicks, stopwatch.ElapsedMilliseconds, sort, arraysize, loopsize);
}

Here's the output.

614149 ticks, 262ms, sort True, arraysize 1000, loopsize 100000
630096 ticks, 269ms, sort False, arraysize 1000, loopsize 100000
634562 ticks, 271ms, sort True, arraysize 1000000, loopsize 100
1840846 ticks, 787ms, sort False, arraysize 1000000, loopsize 100

Relative result timings are generally the same in release vs. debug mode (with debug being about twice as slow).

The last two results, to me, make sense. It follows the logic of the question I linked above and the sorted one runs faster due to what I assume is branch prediction. The first two results, however, I'm confused by. Why are they almost the same time? Also, why are they not significantly faster than the last two? I would think that the smaller array would cause it to be in a closer cache and therefore the whole thing runs faster. (loop size * array size = constant between first two and last two, so same number of inner iterations)

My guess is that the processor is able to track a repeating cycle of 1000. But not for 1 million. — Mysticial, Jun 22 '14 at 17:49
1000 looks a little big for most predictors, I'm guessing that the longer loop is being JITted somehow at runtime (maybe the branch is converted to some conditional move). Try running the same loopsize in both cases, and just multiply the result — Leeor, Jun 25 '14 at 19:40
one cause: for a given sorted array, there will be a constant `C` number of branch prediction failures per loop. For smaller array sizes, that `C` will be a larger percentage of the total number of iterations, and thus a smaller overall time savings. So that might play a part in why your small array, large loop count scenario doesn't do better. — Brandon, Jun 17 '19 at 17:14

Branch prediction on sorted/unsorted different sized arrays

0 Answers0