3

I was playing around with a simple "game" to test different aspects of Data Oriented Design when I stumble across this strange performance drop.

I have this struct to store data of the game ships:

constexpr int MAX_ENEMY_SHIPS = 4000000;
struct Ships
{
    int32_t count;
    v2 pos[MAX_ENEMY_SHIPS];
    ShipMovement movements[MAX_ENEMY_SHIPS];
    ShipDrawing drawings[MAX_ENEMY_SHIPS];
    //ShipOtherData other[MAX_ENEMY_SHIPS]; 

    void Add(Ship ship)
    {
        pos[count] = ship.pos;
        movements[count] = { ship.dir, ship.speed };  
        drawings[count] = { ship.size, ship.color };
        //other[count] = { ship.a, ship.b, ship.c, ship.d };
        count++;
    }
};

Then I have a function to update the movement data:

void MoveShips(v2* positions, ShipMovement* movements, int count)
{
    ScopeBenchmark bench("Move Ships");
    for(int i = 0; i < count; ++i)
    {
        positions[i] = positions[i] + (movements[i].dir * movements[i].speed);
    }
}

My understanding is that, as the MoveShips function only uses the position and movement arrays, extra memory in the Ships struct will not affect its performance. However, when I uncomment the commented lines on the Ships struct, the performance drops a lot. With the current value of MAX_ENEMY_SHIPS, the duration of the MoveShips function in my computer goes from 10-11 ms to 200-210 ms.

Here I let a minimum, reproducible example:

#include <stdlib.h>

#include <stdio.h>
#include <chrono>
#include <string>

class ScopeBenchmark
{
public:
    ScopeBenchmark(std::string text)
        : text(text)
    {
        begin = std::chrono::steady_clock::now();
    }

    ~ScopeBenchmark()
    {
        std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
        printf("%s: %lli\n", text.data(), std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count());
    }

private:
    std::string text;
    std::chrono::steady_clock::time_point begin;

};

constexpr int32_t Color(uint8_t r, uint8_t g, uint8_t b)
{
    return (r << 16) | (g << 8) | b;
}

struct v2
{
    float x;
    float y;
};

inline v2 operator+(v2 a, v2 b)
{
    v2 result;
    result.x = a.x + b.x;
    result.y = a.y + b.y;
    return result;
}

inline v2 operator*(v2 a, float b)
{
    v2 result;
    result.x = a.x * b;
    result.y = a.y * b;
    return result;
}

//----------------------------------------------------------------------

struct Ship
{
    v2 pos;
    v2 size;
    v2 dir;
    float speed;
    int32_t color;

    v2 a;
    v2 b;
    v2 c;
    v2 d;
};

struct ShipMovement
{
    v2 dir;
    float speed;
};

struct ShipDrawing
{
    v2 size;
    int32_t color;
};

struct ShipOtherData
{
    v2 a;
    v2 b;
    v2 c;
    v2 d;
};

constexpr int MAX_ENEMY_SHIPS = 4000000;
struct Ships
{
    int32_t count;
    v2 pos[MAX_ENEMY_SHIPS];
    ShipMovement movements[MAX_ENEMY_SHIPS];
    ShipDrawing drawings[MAX_ENEMY_SHIPS];
    //ShipOtherData other[MAX_ENEMY_SHIPS]; 

    void Add(Ship ship)
    {
        pos[count] = ship.pos;
        movements[count] = { ship.dir, ship.speed };  
        drawings[count] = { ship.size, ship.color };
        //other[count] = { ship.a, ship.b, ship.c, ship.d };
        count++;
    }
};

void MoveShips(v2* positions, ShipMovement* movements, int count)
{
    ScopeBenchmark bench("Move Ships");
    for(int i = 0; i < count; ++i)
    {
        positions[i] = positions[i] + (movements[i].dir * movements[i].speed);
    }
}

struct Game
{
    int32_t playerShipIndex;
    Ships ships;
};

void InitGame(void* gameMemory)
{
    Game* game = (Game*)gameMemory;
    
    Ship ship;
    ship.pos = { 0.0f, 0.0f };
    ship.size = { 100.0f, 100.0f };
    ship.speed = 1.0f;
    ship.color = Color(64, 192, 32);
    game->ships.Add(ship);
    game->playerShipIndex = 0;

    ship.speed *= 0.5f;
    ship.dir.x = -1.0f;
    ship.size = { 50.0f, 50.0f };
    ship.color = Color(192, 64, 32);

    for(int i = 0; i < MAX_ENEMY_SHIPS; i++)
    {
        ship.pos = { 500.0f, 350.0f };
        game->ships.Add(ship);
    }
}


int main()
{
    Game* game = (Game*)malloc(sizeof(Game));  
    memset(game, 0, sizeof(Game));
    
    InitGame(game);
    while (true)
    {
        MoveShips(game->ships.pos, game->ships.movements, game->ships.count);
    }
}

I use the Visual Studio compiler, and compile the file with the following command:

cl.exe /O2 /GL src/Game.cpp

So, my question is: why the performance of the MoveShips function drops so hard when adding memory that is not used?

Kevin
  • 43
  • 3
  • It just means that it takes more time to allocate a big object, than a smaller one. – Igor R. Aug 08 '20 at 15:32
  • 2
    Well, it is unused, but still allocated. You are asking for 4 Million 8 byte "things". Used or unused, that is 32 million bytes per ship. For one ship, doable. For a hundred or a thousand Ships you are starting to talk some real memory. Allocating that memory that way takes it away from other things which tends to slow things down. Then you have to compare 32 million times the number of Ships against the available memory on your system. I'm guessing you are forcing your system into heavy swapping, thus the performance hit. – Frank Merrow Aug 08 '20 at 15:32
  • @Frank Merrow why 8-byte things? More likely 64-byte ones... – Igor R. Aug 08 '20 at 15:35
  • This is very strange. When I run this code on MS Visual Studio 2017 as a release build, I sometimes always get the output 11 and sometimes I always get the output of about 500 every time. Whether it is 11 or 500 is always the same in every program run, but can be different in separate invokations of the program, even if I don't recompile. So this extreme difference is independant of whether the lines mentioned in the question are commented out or not (in my tests they were always not commented). – Andreas Wenzel Aug 08 '20 at 17:25
  • Use `std::vector`. – Pete Becker Aug 08 '20 at 17:29
  • This program has undefined behavior, as `game->ships.Add` is being called `MAX_ENEMY_SHIPS + 1` times, which results in all 4 arrays being indexed out of bounds. However, fixing this did not change the strange behavior that I mentioned above. – Andreas Wenzel Aug 08 '20 at 17:33
  • Regarding the compiler used and the target platform, the static arrays will be allocated on the stack and some platforms does not have a stack big enough causing a segmentation fault. Dynamic arrays should be preferred (allocated on the heap). Aside from that, I cannot reproduce the performance issue with Clang or GCC on Linux (timings are quite good and stable). On Visual Studio 19.24, I do not see any assembly code change in the function `MoveShips` whether the lines are commented or not. Which version Visual Studio do you use? – Jérôme Richard Aug 08 '20 at 18:07
  • @FrankMerrow: I was able to reproduce the problem and it had nothing to do with swapping, but rather with denormalized floats. See my answer for further information. – Andreas Wenzel Aug 08 '20 at 19:21

1 Answers1

5

The problem is that you are passing uninitialized data in the function calls game->ships.Add(ship). This causes undefined behavior.

In the first function call, both ship.dir.x and ship.dir.y are uninitialized. In all further function calls, ship.dir.y is uninitialized.

This can have an especially negative performance impact if ship.dir.y happens to contain garbage data which represents a denormalized floating point value. See this question for further information.

I was able to reproduce your problem and my tests have shown that this is the reason for your performance degradation. By initializing the variable ship.dir.y to a normalized floating point value, I was able to reliably get a performance increase by a factor of 45(!).

I don't think that your problem has anything to do with you increasing the size of your struct by un-commenting two lines code. Although it has been suggested in the comments section that this could cause your program to slow down due to swap space usage, my tests show that this has no significant performance impact in your case. Increasing the total size of the memory allocation to 256 MB should generally not be a problem, unless you are on a computer with a very small amount of memory. Therefore, I believe your observation that the performance decreases significantly when you un-comment those two lines of code is just a coincidence.

My guess is that address space layout randomization (ASLR) is causing you to get different garbage values every time you run your program, so that they sometimes represent a denormalized floating point value and sometimes not. At least that is what I have experienced during my tests: With ASLR active, I sometimes get a denormalized value and sometimes a normalized one. However, with ALSR disabled (using the /DYNAMICBASE:NO linker option in MS Visual Studio), I always get a denormalized value and never a normalized one.

If you are sure that your observations from un-commenting your code are no coincidence, but consistent, then the most likely explanation is that un-commenting the code is causing you to receive different garbage values, which happen to always represent a denormalized floating point value.

Therefore, in order to fix your problem, all you have to do is to make sure that ship.dir.x and ship.dir.y are properly initialized before you pass them to the function.

Also, although this is probably not the reason for your problem, it is important to point out that you are writing to all 4 arrays in struct Ships out of bounds. You are calling the function game->ships.Add(ship) exactly MAX_ENEMY_SHIPS + 1 times, once outside the loop and MAX_ENEMY_SHIPS times inside the loop. Therefore, you are passing the boundary of each array by exactly one element. This also causes undefined behavior.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39