3

I had a look and couldn't see anything quite answering my question.

I'm not exactly the best at creating accurate 'real life' tests, so i'm not sure if that's the problem here. Basically I want to create a few simple neural networks to create something to the effect of Gridworld. Performance of these neural networks will be critical and i dont want the hidden layer to be a bottleneck as much as possible.

I would rather use more memory and be faster, so I opted to use arrays instead of lists (due to lists having an extra bounds check over arrays). The arrays aren't always full, but because the if statement (check if the element is null) is the same until the end, it can be predicted and there is no performance drop from that at all.

My question comes from how I store the data for the network to process. I figured due to 2D arrays storing all the data together it would be better cache wise and would run faster. But from my mock up test that an array of arrays performs much better in this scenario.

Some Code:

    private void RunArrayOfArrayTest(float[][] testArray, Data[] data)
    {
        for (int i = 0; i < testArray.Length; i++) {
            for (int j = 0; j < testArray[i].Length; j++) {
                var inputTotal = data[i].bias;

                for (int k = 0; k < data[i].weights.Length; k++) {
                    inputTotal += testArray[i][k];
                }
            }
        }
    }

    private void Run2DArrayTest(float[,] testArray, Data[] data, int maxI, int maxJ)
    {
        for (int i = 0; i < maxI; i++) {
            for (int j = 0; j < maxJ; j++) {
                var inputTotal = data[i].bias;

                for (int k = 0; k < maxJ; k++) {
                    inputTotal += testArray[i, k];
                }
            }
        }
    }

These are the two functions that are timed. Each 'creature' has its own network (The first for loop), each network has hidden nodes (The second for loop) and i need to find the sum of the weights for each input (The third loop). In my test i stripped it so that it's not really what i am doing in my actual code, but the same amount of loops happen (The data variable would have it's own 2D array, but i didn't want to possibly skew the results). From this i was trying to get a feel for which one is faster, and to my surprise the array of arrays was.

Code to start the tests:

        // Array of Array test
        Stopwatch timer = Stopwatch.StartNew();

        RunArrayOfArrayTest(arrayOfArrays, dataArrays);

        timer.Stop();
        Console.WriteLine("Array of Arrays finished in: " + timer.ElapsedTicks);

        // 2D Array test
        timer = Stopwatch.StartNew();

        Run2DArrayTest(array2D, dataArrays, NumberOfNetworks, NumberOfInputNeurons);

        timer.Stop();
        Console.WriteLine("2D Array finished in: " + timer.ElapsedTicks);

Just wanted to show how i was testing it. The results from this in release mode give me values like:

Array of Arrays finished in: 8972
2D Array finished in: 16376

Can someone explain to me what i'm doing wrong? Why is an array of arrays faster in this situation by so much? Isn't a 2D array all stored together, meaning it would be more cache friendly?

Note i really do need this to be fast as it needs to sum up hundreds of thousands - millions of numbers per frame, and like i said i don't want this is be a problem. I know this can be multi threaded in the future quite easily because each network is completely separate and even each node is completely separate.

Last question i suppose, would something like this be possible to run on the GPU instead? I figure a GPU would not struggle to have much larger amounts of networks with much larger numbers of input/hidden neurons.

Lolop
  • 67
  • 10
  • Something similar displayed in this [SO Post](http://stackoverflow.com/questions/597720/what-are-the-differences-between-a-multidimensional-array-and-an-array-of-arrays) – Hari Prasad Jun 09 '16 at 05:48
  • Cheers thanks. I looked a few posts with similar titles but they usually didn't go into performance like that. John Leidegren pointed out a 2D array SHOULD be faster but isn't because of poor implementation. Do you think if i made my own data structure that stored the data as a 1D array but allowed access like a 2D array, it would change that? – Lolop Jun 09 '16 at 05:51

1 Answers1

4

In the CLR, there are two different types of array:

  • Vectors, which are zero-based, single-dimensional arrays
  • Arrays, which can have non-zero bases and multiple dimensions

Your "array of arrays" is a "vector of vectors" in CLR terms.

Vectors are significantly faster than arrays, basically. It's possible that arrays could be optimized further in later CLR versions, but I doubt that there'll get the same amount of love as vectors, as they're so relatively rarely used. There's not a lot you can do to make CLR arrays faster. As you say, they'll be more cache friendly, but they have this CLR penalty.

You can improve your array-of-arrays code already, however, by only performing the first indexing operation once per row:

private void RunArrayOfArrayTest(float[][] testArray, Data[] data)
{
    for (int i = 0; i < testArray.Length; i++) {

        // These don't change in the loop below, so extract them
        var row = testArray[i];            
        var inputTotal = data[i].bias;
        var weightLength = data[i].weights.Length;
        for (int j = 0; j < row.Length; j++) {
            for (int k = 0; k < weightLength; k++) {
                inputTotal += row[k];
            }
        }
    }
}

If you want to get the cache friendliness and still use a vector, you could have a single float[] and perform the indexing yourself... but I'd probably start off with the array-of-arrays approach.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Thanks for the reply. I figured that i was just doing something wrong and not that Jagged arrays would be faster. I'll probably look into using a 1D array like you said (I was considering it before as well), but i am not sure what the cleanest way to do it would be. Do you think that the performance would be worth the effort? Or will they more or less be the same? Also with what you said about taking those variables out of the loop, would that really matter? I was under the assumption the compiler would be smart enough to do that itself? – Lolop Jun 09 '16 at 06:09
  • @Lolop: I wouldn't like to guess about either what the compiler would do, or what the benefit of using a single array would be. I would *at least* try the approach I've used in my post though - you've already got code to measure it, so give it a whirl. If that's not as fast as you'd like it to be, you could then test the single array approach... just be aware that there will be downsides in readability. – Jon Skeet Jun 09 '16 at 06:11
  • Well, i was completely wrong. Almost halved the time it takes to run just by doing what you said. I think i'm pretty much at the fastest the loop can go, just running the empty loops takes roundly 2/3 of the total execution time. Thanks for the help! Was starting to get really annoyed at why it wasn't working how i expected it to. – Lolop Jun 09 '16 at 06:20