0

Backgorund

Hi,

I am creating a game where the AI is collecting possible moves from positions which is done millions of times per turn. I am trying to figure out the fastest way to store these moves containing e.g. pieceFromSquare, pieceToSquare and other move related things.

Currently I have a Move class which contains public variables for the move as per below:

public class Move
{
    public int FromSquare;
    public int ToSquare;
    public int MoveType;

    public static Move GetMove(int fromSquare, int toSquare, int moveType)
    {
        Move move = new Move();

        move.FromSquare = fromSquare;
        move.ToSquare = toSquare;
        move.MoveType = moveType;

        return move;
    }
}

When the AI finds a move it stores the move in a List. I was thinking if it would be faster to store the move as a list of integers instead so I ran a test:

Test

public void RunTimerTest()
    {
        // Function A
        startTime = DateTime.UtcNow;
        moveListA = new List<List<int>>();
        for (int i = 0; i < numberOfIterations; i++)
        {
            FunctionA();
        }
        PrintResult((float)Math.Round((DateTime.UtcNow - startTime).TotalSeconds, 2), "Function A");

        // Function B
        startTime = DateTime.UtcNow;
        moveListB = new List<Move>();
        for (int i = 0; i < numberOfIterations; i++)
        {
            FunctionB();
        }
        PrintResult((float)Math.Round((DateTime.UtcNow - startTime).TotalSeconds, 2), "Function B");
    }

    private void FunctionA()
    {
        moveListA.Add(new List<int>() { 1, 123, 10});
    }

    private void FunctionB()
    {
        moveListB.Add(Move.GetMove(1, 123, 10));
    }

The test results in the following result when running over 10 million times:

  • Function A ran for: 4,58 s.
  • Function B ran for: 1,47 s.

So it is more than 3 times faster to create the class and populate its variables than to create a list of integers.

Questions

  1. Why is it so much faster to create a class than a list of integers?
  2. Is there an even faster way to store this type of data?
eligolf
  • 1,682
  • 1
  • 6
  • 22
  • 12
    This is a bad benchmark. There are multiple confounding factors which you aren't accounting for. Use a good benchmarking tool such as [BenchmarkDotNet](https://benchmarkdotnet.org/articles/overview.html) – canton7 Feb 08 '22 at 10:33
  • 2
    The main factor is probably just the time taken to allocate objects. A `new List()` needs to allocate 2 objects: the `List` itself, and the underlying `int[]` to hold the elements. If I run a benchmark using a `List`, the time comes out very close to that of the `List`. If I make `Move` a struct, it halves the time taken. If I pre-size the list by doing `new List(numberOfIterations)`, it halves the time taken again (for `numberOfIterations = 100` -- it will make more of a difference if this is larger) – canton7 Feb 08 '22 at 10:44
  • @canton7, I thought it gives a hint, but I will try benchmarking thanks. – eligolf Feb 08 '22 at 10:55
  • @TimSchmelter, performance is everything in this program and case. And no I don't need 10 million listlistint, I need around 10 million list, all stored in a list. Or 10 million move objects in a list – eligolf Feb 08 '22 at 10:57
  • @canton7, How do I make it a struct? – eligolf Feb 08 '22 at 10:58
  • 1
    Are you sure you need to be working with 160MB+ of data (more on x64) on every turn? That's not very cache-friendly: the time taken to access it is probably going to dominate the time taken to compute it. Can't you do something a bit smarter, perhaps? – canton7 Feb 08 '22 at 10:59
  • @eligolf With the `struct` keyword. I recommend picking up a beginner's guide to C#: this is pretty fundamental stuff which you're going to want to know if you're writing performance-critical code – canton7 Feb 08 '22 at 11:00
  • @canton7, Technically the list will be around 20 elements, but it is done say 500k times at least. – eligolf Feb 08 '22 at 11:00
  • 2
    That's a pretty significant difference! Allocating 160MB of memory is very different to allocating 320-ish bytes, and then accessing them repeatedly. Caches matter. – canton7 Feb 08 '22 at 11:01

2 Answers2

2

As mentioned in the comments, the reasons are probably mostly due to the list example needing to allocate at least two objects, possibly more depending on how it is optimized.

For high performance code a common guideline is to avoid high frequency allocations. While allocations in c# are fast, they still take some time to manage. This often means sticking with fixed size arrays, or at least set the capacity of any lists on creation.

Another important point is using structs, they are stored directly in the list/array, instead of storing a reference to a separate object. This avoids some object overhead, removes the memory need of a separate reference, and ensures all values are stored sequentially in memory. All of this help ensure caches are used efficiently. Using a smaller datatype like short/ushort may also help if that is possible. Note that structs should preferably be immutable, and there are some keywords like 'ref' and 'in' that can help avoid the overhead of copying data.

In some specific cases it can be an idea to separate values into different arrays, i.e. one for all the FromSquare values, one for all the ToSquare values etc. This can be a benefit if an operation mostly uses only a single value, again benefiting from better cache usage. It might also make SIMD easier to apply, but that might not apply in this case.

Moreover, when when measuring performance, at least use a stopwatch. That is much more accurate than dateTime, and no harder to use. Benchmark.Net would be even better since that helps compensate for various sources of noise, and can benchmark for multiple platforms. A good profiler can also be useful since it can give hints at what takes most time, how much you are allocating etc.

JonasH
  • 28,608
  • 2
  • 10
  • 23
1

I examined four options, and allocating arrays for each record is by for the slowest. I did not check allocating class objects, as I was going for the faster options.

  • struct Move {} stores three integer values into readonly fields of a strcuture.
  • struct FMove {} stores an fixed int array of size 3
  • (int,int,int) stores a tuple of three int values
  • int[] allocates an array of three values.

With the following results. I am tracking time and size allocated in bytes.

Storage Iterations Time Size
Move struct 10000000 allocations 0.2633584 seconds 120000064 bytes
FMove struct 10000000 allocations 0.3572664 seconds 120000064 bytes
Tuple 10000000 allocations 0.702174 seconds 160000064 bytes
Array 10000000 allocations 1.2226393 seconds 480000064 bytes
public readonly struct Move
{
    public readonly int FromSquare;
    public readonly int ToSquare;
    public readonly int MoveType;

    public Move(int fromSquare, int toSquare, int moveType) : this()
    {
        FromSquare = fromSquare;
        ToSquare = toSquare;
        MoveType = moveType;
    }
}

public unsafe struct FMove
{
    fixed int Data[3];

    public FMove(int fromSquare, int toSquare, int moveType) : this()
    {
        Data[0] = fromSquare;
        Data[1] = toSquare;
        Data[2] = moveType;
    }

    public int FromSquare { get => Data[0]; }
    public int ToSquare { get => Data[1]; }
    public int MoveType { get => Data[2]; }
}

static class Program
{
    static void Main(string[] args)
    {
        // Always compile with Release to time
        const int count = 10000000;
        Console.WriteLine("Burn-in start");
        // Burn-in. Do some calc to spool up
        // the CPU
        AllocateArray(count/10);
        Console.WriteLine("Burn-in end");
        double[] timing = new double[4];
        // store timing results for four different 
        // allocation types



        var sw = new Stopwatch();
        Console.WriteLine("Timming start");

        long startMemory;

        startMemory = GC.GetTotalMemory(true);
        sw.Restart();
        var r4 = AllocateArray(count);
        sw.Stop();
        var s4 = GC.GetTotalMemory(true) - startMemory;
        timing[3] = sw.Elapsed.TotalSeconds;

        startMemory = GC.GetTotalMemory(true);
        sw.Restart();
        var r1 = AllocateMove(count);
        sw.Stop();
        var s1 = GC.GetTotalMemory(true) - startMemory;
        timing[0] = sw.Elapsed.TotalSeconds;

        startMemory = GC.GetTotalMemory(true);
        sw.Restart();
        var r2 = AllocateFMove(count);
        sw.Stop();
        var s2 = GC.GetTotalMemory(true) - startMemory;
        timing[1] = sw.Elapsed.TotalSeconds;

        startMemory = GC.GetTotalMemory(true);
        sw.Restart();
        var r3 = AllocateTuple(count);
        sw.Stop();
        var s3 = GC.GetTotalMemory(true) - startMemory;
        timing[2] = sw.Elapsed.TotalSeconds;

        Console.WriteLine($"| Storage | Iterations | Time | Size |");
        Console.WriteLine($"|---|---|---|---|");
        Console.WriteLine($"|  Move struct| {r1.Count} allocations | {timing[0]} seconds | {s1} bytes |");
        Console.WriteLine($"| FMove struct| {r2.Count} allocations | {timing[1]} seconds | {s2} bytes |");
        Console.WriteLine($"|        Tuple| {r3.Count} allocations | {timing[2]} seconds | {s3} bytes |");
        Console.WriteLine($"|        Array| {r4.Count} allocations | {timing[3]} seconds | {s4} bytes |");
    }

    static List<Move> AllocateMove(int count)
    {
        var result = new List<Move>(count);
        for (int i = 0; i < count; i++)
        {
            result.Add(new Move(1, 123, 10));
        }
        return result;
    }
    static List<FMove> AllocateFMove(int count)
    {
        var result = new List<FMove>(count);
        for (int i = 0; i < count; i++)
        {
            result.Add(new FMove(1, 123, 10));
        }
        return result;
    }

    static List<(int from, int to, int type)> AllocateTuple(int count)
    {
        var result = new List<(int from, int to, int type)>(count);
        for (int i = 0; i < count; i++)
        {
            result.Add((1, 123, 10));
        }
        return result;
    }

    static List<int[]> AllocateArray(int count)
    {
        var result = new List<int[]>(count);
        for (int i = 0; i < count; i++)
        {
            result.Add(new int[] { 1, 123, 10});
        }
        return result;
    }

}

Based on the comments, I decided to use BenchmarkDotNet for the above comparison also and the results are quite similar

Method Count Mean Error StdDev Ratio
Move 10000000 115.9 ms 2.27 ms 2.23 ms 0.10
FMove 10000000 149.7 ms 2.04 ms 1.91 ms 0.12
Tuple 10000000 154.8 ms 2.99 ms 2.80 ms 0.13
Array 10000000 1,217.5 ms 23.84 ms 25.51 ms 1.00

I decided to add a class allocation (called CMove) with the following definition

public class CMove
{
    public readonly int FromSquare;
    public readonly int ToSquare;
    public readonly int MoveType;

    public CMove(int fromSquare, int toSquare, int moveType) 
    {
        FromSquare = fromSquare;
        ToSquare = toSquare;
        MoveType = moveType;
    }
}

And used the above as a baseline for benchmarking. I also tried different allocation sizes. And here is a summary of the results.

chart

Anything below 1.0 means it is faster than CMove. As you can see array allocations is always bad. For a few allocations it does not matter much, but for a lot of allocations there are clear winners.

JAlex
  • 1,486
  • 8
  • 19
  • You went to all that effort to measure memory and produce a nice table.... Why not just use BenchmarkDotNet which does all of that, *and* compensates for sources of error? – canton7 Feb 09 '22 at 08:48
  • 1
    @canton7 - there, I also run with BenchmarkDotNet. See updated answer now. – JAlex Feb 09 '22 at 14:34
  • 1
    @canton7 - I also added class allocation like the _op_ had to compare. – JAlex Feb 09 '22 at 15:07